Sentiment analysis using image, text and video

Emotions and sentiments play a pivotal role in the modern society. In most human-centric environments, they are essential to assist decision-making, communication, and situation awareness. With the explosive increase in usage of social media (text, image and video) along with sentiment polarities fo...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Qian
Other Authors: Erik Cambria
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/161285
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-161285
record_format dspace
spelling sg-ntu-dr.10356-1612852022-09-01T02:33:19Z Sentiment analysis using image, text and video Chen, Qian Erik Cambria School of Computer Science and Engineering Erik Cambria cambria@ntu.edu.sg Engineering::Computer science and engineering Emotions and sentiments play a pivotal role in the modern society. In most human-centric environments, they are essential to assist decision-making, communication, and situation awareness. With the explosive increase in usage of social media (text, image and video) along with sentiment polarities for specific subjects (e.g., product reviews, political views and depression emotions), sentiment analysis has increasingly evolved as a subcomponent technology in lots of industries. People are able to present their experience and feelings using images and there is a trend that people prefer image rather than just text. Compared with text, images provide more cues that better reflect people’s sentiments and people can get a more perceptual intuition of sentiment. Particularly for the depression recognition problem in healthcare field, images containing human faces present emotions more intuitively with the face expressions. Hence, prediction of sentiment from visual cues is complementary to textual sentiment analysis. In this dissertation, studies are conducted to explore the sentiment analysis on media data ranging from image, image-text, to video data. We start from sentiment analysis on image data to explore the sentiment polarities. Then, investigations of sentiment analysis are conducted on images and their tags/captions, as such two types of data modalities provide more cues for improved sentiment analysis. Last, we explore the mystery of human emotions and dive into the issue of depression analysis on face videos. The main contributions of this thesis can be summarized as follows. Firstly, for a single image, it may contain several concepts. To model the sequence of different sentiments of such concepts, we consider a Recurrent Neural Networks (RNN) besides Convolutional Neural Network (CNN). The proposed Convolutional Recurrent Image Sentiment Classification (CRISC) model is able to analyze the sentiments of the context in one image without using the labels for the visual concepts. Secondly, to explore the benefit of text data for image sentiment analysis, we propose to extract visual features by fine-tuning a 2D-CNN pre-trained on a large-scale image dataset and extract textual features using AffectiveSpace of English concepts. We propose a novel sentiment score to combine the image and text predictions and evaluate our model on the dataset of images with corresponding labels and captions. We show that accuracy by merging scores from text and image models is higher than using any one system alone. Finally, we investigate multimodal facial depression representation by using facial dynamics and facial appearance. To mine the correlated and complementary depression patterns in multimodal learning, we consider a chained-fusion mechanism to jointly learn facial appearance and dynamics in a unified framework. Therefore, this dissertation demonstrates our studies on image sentiment analysis, focusing particularly on facial depression recognition. Doctor of Philosophy 2022-08-24T02:30:20Z 2022-08-24T02:30:20Z 2022 Thesis-Doctor of Philosophy Chen, Q. (2022). Sentiment analysis using image, text and video. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/161285 https://hdl.handle.net/10356/161285 10.32657/10356/161285 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Chen, Qian
Sentiment analysis using image, text and video
description Emotions and sentiments play a pivotal role in the modern society. In most human-centric environments, they are essential to assist decision-making, communication, and situation awareness. With the explosive increase in usage of social media (text, image and video) along with sentiment polarities for specific subjects (e.g., product reviews, political views and depression emotions), sentiment analysis has increasingly evolved as a subcomponent technology in lots of industries. People are able to present their experience and feelings using images and there is a trend that people prefer image rather than just text. Compared with text, images provide more cues that better reflect people’s sentiments and people can get a more perceptual intuition of sentiment. Particularly for the depression recognition problem in healthcare field, images containing human faces present emotions more intuitively with the face expressions. Hence, prediction of sentiment from visual cues is complementary to textual sentiment analysis. In this dissertation, studies are conducted to explore the sentiment analysis on media data ranging from image, image-text, to video data. We start from sentiment analysis on image data to explore the sentiment polarities. Then, investigations of sentiment analysis are conducted on images and their tags/captions, as such two types of data modalities provide more cues for improved sentiment analysis. Last, we explore the mystery of human emotions and dive into the issue of depression analysis on face videos. The main contributions of this thesis can be summarized as follows. Firstly, for a single image, it may contain several concepts. To model the sequence of different sentiments of such concepts, we consider a Recurrent Neural Networks (RNN) besides Convolutional Neural Network (CNN). The proposed Convolutional Recurrent Image Sentiment Classification (CRISC) model is able to analyze the sentiments of the context in one image without using the labels for the visual concepts. Secondly, to explore the benefit of text data for image sentiment analysis, we propose to extract visual features by fine-tuning a 2D-CNN pre-trained on a large-scale image dataset and extract textual features using AffectiveSpace of English concepts. We propose a novel sentiment score to combine the image and text predictions and evaluate our model on the dataset of images with corresponding labels and captions. We show that accuracy by merging scores from text and image models is higher than using any one system alone. Finally, we investigate multimodal facial depression representation by using facial dynamics and facial appearance. To mine the correlated and complementary depression patterns in multimodal learning, we consider a chained-fusion mechanism to jointly learn facial appearance and dynamics in a unified framework. Therefore, this dissertation demonstrates our studies on image sentiment analysis, focusing particularly on facial depression recognition.
author2 Erik Cambria
author_facet Erik Cambria
Chen, Qian
format Thesis-Doctor of Philosophy
author Chen, Qian
author_sort Chen, Qian
title Sentiment analysis using image, text and video
title_short Sentiment analysis using image, text and video
title_full Sentiment analysis using image, text and video
title_fullStr Sentiment analysis using image, text and video
title_full_unstemmed Sentiment analysis using image, text and video
title_sort sentiment analysis using image, text and video
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/161285
_version_ 1744365401781829632