Multimodal audio-visual emotion detection
Audio and visual utterances in video are temporally and semantically dependent to each other so modeling of temporal and contextual characteristics plays a vital role in understanding of conflicting or supporting emotional cues in audio-visual emotion recognition (AVER). We introduced a novel tempor...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Research |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/153490 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-153490 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1534902022-01-05T09:23:40Z Multimodal audio-visual emotion detection Chaudhary, Nitesh Kumar Jagath C Rajapakse School of Computer Science and Engineering ASJagath@ntu.edu.sg Engineering::Computer science and engineering Audio and visual utterances in video are temporally and semantically dependent to each other so modeling of temporal and contextual characteristics plays a vital role in understanding of conflicting or supporting emotional cues in audio-visual emotion recognition (AVER). We introduced a novel temporal modelling with contextual features for audio and video hierarchies to AVER. To extract abstract temporal information, we first build temporal audio and visual sequences that are then fed into large Convolutional Neural Network (CNN) embeddings. We trained a recurrent network to capture contextual semantics from temporal interdependencies of audio and video streams by using the abstract temporal information. The encapsulated AVER approach is end-to-end trainable and enhances the state-of-art accuracies with a greater margin. Master of Engineering 2021-12-06T05:19:42Z 2021-12-06T05:19:42Z 2021 Thesis-Master by Research Chaudhary, N. K. (2021). Multimodal audio-visual emotion detection. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/153490 https://hdl.handle.net/10356/153490 10.32657/10356/153490 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Chaudhary, Nitesh Kumar Multimodal audio-visual emotion detection |
description |
Audio and visual utterances in video are temporally and semantically dependent to each other so modeling of temporal and contextual characteristics plays a vital role in understanding of conflicting or supporting emotional cues in audio-visual emotion recognition (AVER). We introduced a novel temporal modelling with contextual features for audio and video hierarchies to AVER. To extract abstract temporal information, we first build temporal audio and visual sequences that are then fed into large Convolutional Neural Network (CNN) embeddings. We trained a recurrent network to capture contextual semantics from temporal interdependencies of audio and video streams by using the abstract temporal information. The encapsulated AVER approach is end-to-end trainable and enhances the state-of-art accuracies with a greater margin. |
author2 |
Jagath C Rajapakse |
author_facet |
Jagath C Rajapakse Chaudhary, Nitesh Kumar |
format |
Thesis-Master by Research |
author |
Chaudhary, Nitesh Kumar |
author_sort |
Chaudhary, Nitesh Kumar |
title |
Multimodal audio-visual emotion detection |
title_short |
Multimodal audio-visual emotion detection |
title_full |
Multimodal audio-visual emotion detection |
title_fullStr |
Multimodal audio-visual emotion detection |
title_full_unstemmed |
Multimodal audio-visual emotion detection |
title_sort |
multimodal audio-visual emotion detection |
publisher |
Nanyang Technological University |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/153490 |
_version_ |
1722355385627574272 |