Decoding ‘Impulsion of Movie's Content-Based Factors in Multi-Modal Movie Recommendation System’

BLOGS

Decoding ‘Impulsion of Movie's Content-Based Factors in Multi-Modal Movie Recommendation System’

By Brijraj Singh, Research Scientist At Sony Research India

29^th November 2023

In this blog, Brijraj Singh summarises the paper titled ‘Impulsion of Movie’s Content-Based Factors in Multi-Modal Movie Recommendation System’ co-authored by Prabir Mondal, Pulkit Kapoor, Siddharth Singh, Sriparna Saha and Naoyuki Onoe which was accepted at the International Conference on Neural Information Processing (ICONIP) in Changsha, China from 20th-23rd November 2023.

Introduction

This research paper delves into the realm of recommendation systems, particularly focusing on movie recommendations, a pivotal component of modern streaming platforms with extensive film libraries. The paper highlights a significant limitation in existing approaches, which treat user inputs as uniform, despite the fact that individual users perceive movies differently, influenced by factors such as genre, story, director, and cast. To address this, the authors introduce two novel metrics: TextLike_score (TL_score) and GenreLike_score (GL_score). These scores play a critical role in their Cross-Attention-based Model, which outperforms the current state-of-the-art recommendation systems by considering these nuanced user preferences.

The research is supported by evaluations conducted on two diverse datasets: MovieLens-100K (ML-100K) and MFVCD-7K. Notably, the authors leverage multi-modal data, including audio, video, and textual information, to calculate the introduced scores. Their experimental results affirm that their Cross-Attention-based multi-modal recommendation system, incorporating the Meta_score, effectively addresses user preferences, making it a compelling solution for real-time movie recommendations.

The importance of understanding user preferences in the digital age, particularly for platforms like streaming services, is emphasised. With the proliferation of personal viewing devices and the rise of OTT platforms, the demand for tailored movie recommendations is ever-increasing. Traditional recommendation systems have focused on predicting user-movie ratings based on embeddings derived from text, audio, or video data, ignoring the intricate nuances of user preferences for genres, directors, cast, and storylines.

Findings

To address the limitations mentioned above, the authors propose the introduction of TextLike_score (TL_score) and GenreLike_score (GL_score) as parameters to quantify textual content and genre preferences. Unlike conventional methods, their model applies a Meta_score to user-movie embeddings, ensuring a more accurate representation of user preferences.

The authors’ innovative approach involves a Cross-Attention-based rating prediction model that considers audio and video embeddings of movies, combined with user embeddings. A self-attention-based fusion technique, complemented by multi-head cross-attention, facilitates the merging of these different modalities. Their model’s superiority is demonstrated through empirical evaluation on the ML-100K and MFVCD-7K datasets, reaffirming its effectiveness in real-time scenarios.

The research paper concludes by hinting at future extensions, including multitask settings such as movie genre prediction and user age/gender prediction, alongside user-movie rating prediction, highlighting the potential for even more refined and personalised recommendation systems.

To know more about Sony Research India’s Research Publications, visit the ‘Publications’ section on our ‘Open Innovation’s page:

Open Innovation with Sony R&D – Sony Research India

Fig: Process diagram of LLM based Recommendation System

In this paper we explore movie recommendation and have established the concept (CDHRNN: Content Driven Hierarchical Recurrent Neural Network) of considering the description of the movie along with movie_Id which provides better recommendation performance because of the better representation of the items. The movie_Id or user_Id does not contain any information (as they are unique numbers) that can help in understanding whether two movies are similar or dissimilar by looking at their Id. Therefore, when they are considered with the description, the previous query can easily be answered, which helps in improving the recommendation performance. In the case of the video recommendation, where the item is a movie, finding the description/plot of the movie is not so challenging as it can easily be web-scrapped through a repository like IMDb. However, there could be a scenario when the description of the items cannot be crawled (considering other recommendation domains as well), in such cases we have proposed the use of LLM (Large Language Models), which receives exhaustive training of existing popular content. To be specific, we used Alpaca-Lora, and with the help of prompting which sets the context to provide the desirable description or plot of the movie. Since, the performance of LLM depends on its encounter with the online content at the time of training, it sets up a condition where only contemporary plots should be requested of the movies. If we query the plot of an old and unpopular movie, there is a chance that the model would not have received the training of that content and then it will try to provide the description based on its learning corresponding to similar items, which might not be relevant. This problem of LLM is known as hallucination when it provides the response based on irrelevant understanding. Hence, LLM helps in generating the content under certain assumptions.

For more information on the paper, ‘LLM Based Generation of Item-Description for Recommendation System’, visit: https://dl.acm.org/doi/abs/10.1145/3604915.3610647