Modeling sentiments and preferences from multimodal data
Online reviews are prevalent in many modern Web applications, such as e-commerce, crowd-sourced location and check-in platforms. Fueled by the rise of mobile phones that are often the only cameras on hand, reviews are increasingly multimodal, with photos in addition to textual content. In this thesi...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2022
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/etd_coll/389 https://ink.library.smu.edu.sg/context/etd_coll/article/1387/viewcontent/GPIS_AY2017_PhD_Truong_Quoc_Tuan.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Summary: | Online reviews are prevalent in many modern Web applications, such as e-commerce, crowd-sourced location and check-in platforms. Fueled by the rise of mobile phones that are often the only cameras on hand, reviews are increasingly multimodal, with photos in addition to textual content. In this thesis, we focus on modeling the subjectivity carried in this form of data, with two research objectives.
In the first part, we tackle the problem of detecting sentiment expressed by a review. This is a key unlocking many applications, e.g., analyzing opinions, monitoring consumer satisfaction, assessing product quality. Traditionally, the task of sentiment analysis primarily relies on textual content. We focus on the visual sentiment of review images and develop models to systematically analyze the impact of three factors: image, user, and item. Further investigation leads to a notion of concept-orientation generalizing visual sentiment analysis for Web images. Then, we observe that in many cases, with respect to sentiment detection, images play a supporting role to text, highlighting the salient aspects of an entity, rather than expressing sentiments independently. Therefore, we develop a visual aspect attention mechanism that relies on visual information as alignment for pointing out the important sentences of a document. The method is effective for a scenario of one document being associated with multiple images, such as online reviews, blog posts, social networks, and media articles. Furthermore, we study the utilization of sentiment as an independent modality in the context of cross-modal retrieval. We first formulate the problem of sentiment-oriented text-to-image retrieval and then propose two approaches for incorporating sentiment into text queries based on metric learning. Each approach emphasizes a hypothesis on how the sentiment vectors aligned in the metric space that also includes text and visual vectors.
In the second part, we focus on developing models for capturing user preferences from multimodal data. Preference modeling is crucial to recommender systems which are core to modern online user-based platforms. The need for recommendations is to guide users in browsing the myriad of options offered to them. In online reviews, for instance, preference manifests in numerical rating, textual content, as well as visual images. First, we hypothesize that modeling these modalities jointly would result in a more holistic representation of a review towards more accurate recommendations. Therefore, we propose an approach that captures user preferences via simultaneously modeling a rating prediction component and a review text generation component. Second, we introduce a new generative model of preferences, inspired by the dyadic nature of the preference signals. The model is bilateral making it more apt for bipartite interactions, as well as allowing easy incorporation of auxiliary data from both sides of user and item. Third, we develop a probabilistic framework for modeling preferences involving logged bandit feedback. It helps deal with the sparsity issue in learning from bandit feedback on publisher sites by leveraging relevant organic feedback from e-commerce sites. Through empirical evaluation, we demonstrate that the proposed framework is effective for recommendation and ads placement systems.
In general, we present multiple approaches to modeling various aspects of sentiment and preference signals from multimodal data. Our work contributes a set of techniques that could be broadly extensible for mining Web data. Additionally, this research facilitates the development of recommender systems, which play a significant role in many online user-based platforms. |
---|