Deep multimodal learning for affective analysis and retrieval

Social media has been a convenient platform for voicing opinions through posting messages, ranging from tweeting a short text to uploading a media file, or any combination of messages. Understanding the perceived emotions inherently underlying these user-generated contents (UGC) could bring light to...

Full description

Saved in:

Bibliographic Details
Main Authors:	PANG, Lei, ZHU, Shiai, NGO, Chong-wah
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2015
Subjects:	Cross-modal retrieval deep Boltzmann machine emotion analysis multimodal learning Data Storage Systems Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/6356 https://ink.library.smu.edu.sg/context/sis_research/article/7359/viewcontent/deep_multimodal_emotion_pl.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7359
record_format	dspace
spelling	sg-smu-ink.sis_research-73592021-11-23T03:47:59Z Deep multimodal learning for affective analysis and retrieval PANG, Lei ZHU, Shiai NGO, Chong-wah Social media has been a convenient platform for voicing opinions through posting messages, ranging from tweeting a short text to uploading a media file, or any combination of messages. Understanding the perceived emotions inherently underlying these user-generated contents (UGC) could bring light to emerging applications such as advertising and media analytics. Existing research efforts on affective computation are mostly dedicated to single media, either text captions or visual content. Few attempts for combined analysis of multiple media are made, despite that emotion can be viewed as an expression of multimodal experience. In this paper, we explore the learning of highly non-linear relationships that exist among low-level features across different modalities for emotion prediction. Using the deep Bolzmann machine (DBM), a joint density model over the space of multimodal inputs, including visual, auditory, and textual modalities, is developed. The model is trained directly using UGC data without any labeling efforts. While the model learns a joint representation over multimodal inputs, training samples in absence of certain modalities can also be leveraged. More importantly, the joint representation enables emotion-oriented cross-modal retrieval, for example, retrieval of videos using the text query "crazy cat". The model does not restrict the types of input and output, and hence, in principle, emotion prediction and retrieval on any combinations of media are feasible. Extensive experiments on web videos and images show that the learnt joint representation could be very compact and be complementary to hand-crafted features, leading to performance improvement in both emotion classification and cross-modal retrieval. 2015-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6356 info:doi/10.1109/TMM.2015.2482228 https://ink.library.smu.edu.sg/context/sis_research/article/7359/viewcontent/deep_multimodal_emotion_pl.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Cross-modal retrieval deep Boltzmann machine emotion analysis multimodal learning Data Storage Systems Graphics and Human Computer Interfaces
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Cross-modal retrieval deep Boltzmann machine emotion analysis multimodal learning Data Storage Systems Graphics and Human Computer Interfaces
spellingShingle	Cross-modal retrieval deep Boltzmann machine emotion analysis multimodal learning Data Storage Systems Graphics and Human Computer Interfaces PANG, Lei ZHU, Shiai NGO, Chong-wah Deep multimodal learning for affective analysis and retrieval
description	Social media has been a convenient platform for voicing opinions through posting messages, ranging from tweeting a short text to uploading a media file, or any combination of messages. Understanding the perceived emotions inherently underlying these user-generated contents (UGC) could bring light to emerging applications such as advertising and media analytics. Existing research efforts on affective computation are mostly dedicated to single media, either text captions or visual content. Few attempts for combined analysis of multiple media are made, despite that emotion can be viewed as an expression of multimodal experience. In this paper, we explore the learning of highly non-linear relationships that exist among low-level features across different modalities for emotion prediction. Using the deep Bolzmann machine (DBM), a joint density model over the space of multimodal inputs, including visual, auditory, and textual modalities, is developed. The model is trained directly using UGC data without any labeling efforts. While the model learns a joint representation over multimodal inputs, training samples in absence of certain modalities can also be leveraged. More importantly, the joint representation enables emotion-oriented cross-modal retrieval, for example, retrieval of videos using the text query "crazy cat". The model does not restrict the types of input and output, and hence, in principle, emotion prediction and retrieval on any combinations of media are feasible. Extensive experiments on web videos and images show that the learnt joint representation could be very compact and be complementary to hand-crafted features, leading to performance improvement in both emotion classification and cross-modal retrieval.
format	text
author	PANG, Lei ZHU, Shiai NGO, Chong-wah
author_facet	PANG, Lei ZHU, Shiai NGO, Chong-wah
author_sort	PANG, Lei
title	Deep multimodal learning for affective analysis and retrieval
title_short	Deep multimodal learning for affective analysis and retrieval
title_full	Deep multimodal learning for affective analysis and retrieval
title_fullStr	Deep multimodal learning for affective analysis and retrieval
title_full_unstemmed	Deep multimodal learning for affective analysis and retrieval
title_sort	deep multimodal learning for affective analysis and retrieval
publisher	Institutional Knowledge at Singapore Management University
publishDate	2015
url	https://ink.library.smu.edu.sg/sis_research/6356 https://ink.library.smu.edu.sg/context/sis_research/article/7359/viewcontent/deep_multimodal_emotion_pl.pdf
_version_	1770575940790779904

Deep multimodal learning for affective analysis and retrieval

Similar Items