Multimodal learning with deep Boltzmann Machine for emotion prediction in user generated videos
Detecting emotions from user-generated videos, such as“anger” and “sadness”, has attracted widespread interest recently. The problem is challenging as effectively representing video data with multi-view information (e.g., audio, video or text) is not trivial. In contrast to the existing works that e...
Saved in:
Main Authors: | , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2015
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/6502 https://ink.library.smu.edu.sg/context/sis_research/article/7505/viewcontent/2671188.2749400.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-7505 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-75052022-01-10T04:53:13Z Multimodal learning with deep Boltzmann Machine for emotion prediction in user generated videos PANG, Lei NGO, Chong-wah Detecting emotions from user-generated videos, such as“anger” and “sadness”, has attracted widespread interest recently. The problem is challenging as effectively representing video data with multi-view information (e.g., audio, video or text) is not trivial. In contrast to the existing works that extract features from each modality (view) separately followed by early or late fusion, we propose to learn a joint density model over the space of multi-modal inputs (including visual, auditory and textual modalities) with Deep Boltzmann Machine (DBM). The model is trained directly on the user-generated Web videos without any labeling effort. More importantly, the deep architecture enlightens the possibility of discovering the highly non-linear relationships that exist between lowlevel features across different modalities. The experiment results show that the DBM model learns joint representation complementary to the hand-crafted visual and auditory features, leading to 7.7% performance improvement in classification accuracy on the recently released VideoEmotion dataset. 2015-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6502 info:doi/10.1145/2671188.2749400 https://ink.library.smu.edu.sg/context/sis_research/article/7505/viewcontent/2671188.2749400.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Deep Boltzmann Machine Emotion analysis Multimodal learning Databases and Information Systems Graphics and Human Computer Interfaces |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Deep Boltzmann Machine Emotion analysis Multimodal learning Databases and Information Systems Graphics and Human Computer Interfaces |
spellingShingle |
Deep Boltzmann Machine Emotion analysis Multimodal learning Databases and Information Systems Graphics and Human Computer Interfaces PANG, Lei NGO, Chong-wah Multimodal learning with deep Boltzmann Machine for emotion prediction in user generated videos |
description |
Detecting emotions from user-generated videos, such as“anger” and “sadness”, has attracted widespread interest recently. The problem is challenging as effectively representing video data with multi-view information (e.g., audio, video or text) is not trivial. In contrast to the existing works that extract features from each modality (view) separately followed by early or late fusion, we propose to learn a joint density model over the space of multi-modal inputs (including visual, auditory and textual modalities) with Deep Boltzmann Machine (DBM). The model is trained directly on the user-generated Web videos without any labeling effort. More importantly, the deep architecture enlightens the possibility of discovering the highly non-linear relationships that exist between lowlevel features across different modalities. The experiment results show that the DBM model learns joint representation complementary to the hand-crafted visual and auditory features, leading to 7.7% performance improvement in classification accuracy on the recently released VideoEmotion dataset. |
format |
text |
author |
PANG, Lei NGO, Chong-wah |
author_facet |
PANG, Lei NGO, Chong-wah |
author_sort |
PANG, Lei |
title |
Multimodal learning with deep Boltzmann Machine for emotion prediction in user generated videos |
title_short |
Multimodal learning with deep Boltzmann Machine for emotion prediction in user generated videos |
title_full |
Multimodal learning with deep Boltzmann Machine for emotion prediction in user generated videos |
title_fullStr |
Multimodal learning with deep Boltzmann Machine for emotion prediction in user generated videos |
title_full_unstemmed |
Multimodal learning with deep Boltzmann Machine for emotion prediction in user generated videos |
title_sort |
multimodal learning with deep boltzmann machine for emotion prediction in user generated videos |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2015 |
url |
https://ink.library.smu.edu.sg/sis_research/6502 https://ink.library.smu.edu.sg/context/sis_research/article/7505/viewcontent/2671188.2749400.pdf |
_version_ |
1770575977152249856 |