Multimodal audio-visual emotion detection

Audio and visual utterances in video are temporally and semantically dependent to each other so modeling of temporal and contextual characteristics plays a vital role in understanding of conflicting or supporting emotional cues in audio-visual emotion recognition (AVER). We introduced a novel tempor...

Full description

Saved in:
Bibliographic Details
Main Author: Chaudhary, Nitesh Kumar
Other Authors: Jagath C Rajapakse
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/153490
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-153490
record_format dspace
spelling sg-ntu-dr.10356-1534902022-01-05T09:23:40Z Multimodal audio-visual emotion detection Chaudhary, Nitesh Kumar Jagath C Rajapakse School of Computer Science and Engineering ASJagath@ntu.edu.sg Engineering::Computer science and engineering Audio and visual utterances in video are temporally and semantically dependent to each other so modeling of temporal and contextual characteristics plays a vital role in understanding of conflicting or supporting emotional cues in audio-visual emotion recognition (AVER). We introduced a novel temporal modelling with contextual features for audio and video hierarchies to AVER. To extract abstract temporal information, we first build temporal audio and visual sequences that are then fed into large Convolutional Neural Network (CNN) embeddings. We trained a recurrent network to capture contextual semantics from temporal interdependencies of audio and video streams by using the abstract temporal information. The encapsulated AVER approach is end-to-end trainable and enhances the state-of-art accuracies with a greater margin. Master of Engineering 2021-12-06T05:19:42Z 2021-12-06T05:19:42Z 2021 Thesis-Master by Research Chaudhary, N. K. (2021). Multimodal audio-visual emotion detection. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/153490 https://hdl.handle.net/10356/153490 10.32657/10356/153490 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Chaudhary, Nitesh Kumar
Multimodal audio-visual emotion detection
description Audio and visual utterances in video are temporally and semantically dependent to each other so modeling of temporal and contextual characteristics plays a vital role in understanding of conflicting or supporting emotional cues in audio-visual emotion recognition (AVER). We introduced a novel temporal modelling with contextual features for audio and video hierarchies to AVER. To extract abstract temporal information, we first build temporal audio and visual sequences that are then fed into large Convolutional Neural Network (CNN) embeddings. We trained a recurrent network to capture contextual semantics from temporal interdependencies of audio and video streams by using the abstract temporal information. The encapsulated AVER approach is end-to-end trainable and enhances the state-of-art accuracies with a greater margin.
author2 Jagath C Rajapakse
author_facet Jagath C Rajapakse
Chaudhary, Nitesh Kumar
format Thesis-Master by Research
author Chaudhary, Nitesh Kumar
author_sort Chaudhary, Nitesh Kumar
title Multimodal audio-visual emotion detection
title_short Multimodal audio-visual emotion detection
title_full Multimodal audio-visual emotion detection
title_fullStr Multimodal audio-visual emotion detection
title_full_unstemmed Multimodal audio-visual emotion detection
title_sort multimodal audio-visual emotion detection
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/153490
_version_ 1722355385627574272