Multimodal audio-visual emotion detection

Audio and visual utterances in video are temporally and semantically dependent to each other so modeling of temporal and contextual characteristics plays a vital role in understanding of conflicting or supporting emotional cues in audio-visual emotion recognition (AVER). We introduced a novel tempor...

Full description

Saved in:

Bibliographic Details
Main Author:	Chaudhary, Nitesh Kumar
Other Authors:	Jagath C Rajapakse
Format:	Thesis-Master by Research
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/153490
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-153490
record_format	dspace
spelling	sg-ntu-dr.10356-1534902022-01-05T09:23:40Z Multimodal audio-visual emotion detection Chaudhary, Nitesh Kumar Jagath C Rajapakse School of Computer Science and Engineering ASJagath@ntu.edu.sg Engineering::Computer science and engineering Audio and visual utterances in video are temporally and semantically dependent to each other so modeling of temporal and contextual characteristics plays a vital role in understanding of conflicting or supporting emotional cues in audio-visual emotion recognition (AVER). We introduced a novel temporal modelling with contextual features for audio and video hierarchies to AVER. To extract abstract temporal information, we first build temporal audio and visual sequences that are then fed into large Convolutional Neural Network (CNN) embeddings. We trained a recurrent network to capture contextual semantics from temporal interdependencies of audio and video streams by using the abstract temporal information. The encapsulated AVER approach is end-to-end trainable and enhances the state-of-art accuracies with a greater margin. Master of Engineering 2021-12-06T05:19:42Z 2021-12-06T05:19:42Z 2021 Thesis-Master by Research Chaudhary, N. K. (2021). Multimodal audio-visual emotion detection. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/153490 https://hdl.handle.net/10356/153490 10.32657/10356/153490 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Chaudhary, Nitesh Kumar Multimodal audio-visual emotion detection
description	Audio and visual utterances in video are temporally and semantically dependent to each other so modeling of temporal and contextual characteristics plays a vital role in understanding of conflicting or supporting emotional cues in audio-visual emotion recognition (AVER). We introduced a novel temporal modelling with contextual features for audio and video hierarchies to AVER. To extract abstract temporal information, we first build temporal audio and visual sequences that are then fed into large Convolutional Neural Network (CNN) embeddings. We trained a recurrent network to capture contextual semantics from temporal interdependencies of audio and video streams by using the abstract temporal information. The encapsulated AVER approach is end-to-end trainable and enhances the state-of-art accuracies with a greater margin.
author2	Jagath C Rajapakse
author_facet	Jagath C Rajapakse Chaudhary, Nitesh Kumar
format	Thesis-Master by Research
author	Chaudhary, Nitesh Kumar
author_sort	Chaudhary, Nitesh Kumar
title	Multimodal audio-visual emotion detection
title_short	Multimodal audio-visual emotion detection
title_full	Multimodal audio-visual emotion detection
title_fullStr	Multimodal audio-visual emotion detection
title_full_unstemmed	Multimodal audio-visual emotion detection
title_sort	multimodal audio-visual emotion detection
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/153490
_version_	1722355385627574272

Multimodal audio-visual emotion detection

Similar Items