Sentiment analysis using image, text and video

Emotions and sentiments play a pivotal role in the modern society. In most human-centric environments, they are essential to assist decision-making, communication, and situation awareness. With the explosive increase in usage of social media (text, image and video) along with sentiment polarities fo...

全面介紹

Saved in:
書目詳細資料
主要作者: Chen, Qian
其他作者: Erik Cambria
格式: Thesis-Doctor of Philosophy
語言:English
出版: Nanyang Technological University 2022
主題:
在線閱讀:https://hdl.handle.net/10356/161285
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Emotions and sentiments play a pivotal role in the modern society. In most human-centric environments, they are essential to assist decision-making, communication, and situation awareness. With the explosive increase in usage of social media (text, image and video) along with sentiment polarities for specific subjects (e.g., product reviews, political views and depression emotions), sentiment analysis has increasingly evolved as a subcomponent technology in lots of industries. People are able to present their experience and feelings using images and there is a trend that people prefer image rather than just text. Compared with text, images provide more cues that better reflect people’s sentiments and people can get a more perceptual intuition of sentiment. Particularly for the depression recognition problem in healthcare field, images containing human faces present emotions more intuitively with the face expressions. Hence, prediction of sentiment from visual cues is complementary to textual sentiment analysis. In this dissertation, studies are conducted to explore the sentiment analysis on media data ranging from image, image-text, to video data. We start from sentiment analysis on image data to explore the sentiment polarities. Then, investigations of sentiment analysis are conducted on images and their tags/captions, as such two types of data modalities provide more cues for improved sentiment analysis. Last, we explore the mystery of human emotions and dive into the issue of depression analysis on face videos. The main contributions of this thesis can be summarized as follows. Firstly, for a single image, it may contain several concepts. To model the sequence of different sentiments of such concepts, we consider a Recurrent Neural Networks (RNN) besides Convolutional Neural Network (CNN). The proposed Convolutional Recurrent Image Sentiment Classification (CRISC) model is able to analyze the sentiments of the context in one image without using the labels for the visual concepts. Secondly, to explore the benefit of text data for image sentiment analysis, we propose to extract visual features by fine-tuning a 2D-CNN pre-trained on a large-scale image dataset and extract textual features using AffectiveSpace of English concepts. We propose a novel sentiment score to combine the image and text predictions and evaluate our model on the dataset of images with corresponding labels and captions. We show that accuracy by merging scores from text and image models is higher than using any one system alone. Finally, we investigate multimodal facial depression representation by using facial dynamics and facial appearance. To mine the correlated and complementary depression patterns in multimodal learning, we consider a chained-fusion mechanism to jointly learn facial appearance and dynamics in a unified framework. Therefore, this dissertation demonstrates our studies on image sentiment analysis, focusing particularly on facial depression recognition.