Multimodal affective computing for video summarization

This research work attempts to merge affective computing and video summarization, thereby enhancing the latter by integrating cross-disciplinary affective information, termed affective video summarization. Affective video summarization functions by identifying emotionally impactful moments in the...

Full description

Saved in:

Bibliographic Details
Main Author:	Lew, Lincoln Wai Cheong
Other Authors:	Quek Hiok Chai
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Video summarization EEG emotion recognition
Online Access:	https://hdl.handle.net/10356/174824
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-174824
record_format	dspace
spelling	sg-ntu-dr.10356-1748242024-05-03T02:58:53Z Multimodal affective computing for video summarization Lew, Lincoln Wai Cheong Quek Hiok Chai School of Computer Science and Engineering A*STAR Institute for Infocomm Research Tan Ah Hwee ASHCQUEK@ntu.edu.sg, ahtan@smu.edu.sg Computer and Information Science Video summarization EEG emotion recognition This research work attempts to merge affective computing and video summarization, thereby enhancing the latter by integrating cross-disciplinary affective information, termed affective video summarization. Affective video summarization functions by identifying emotionally impactful moments in the video using emotional cues, resulting in summaries to enhance user experiences. Existing visual-based video summarization methods frequently neglect integrating affective information to improve summaries through emotional considerations. Alternatively, they may disregard the visual element and instead utilize alternative modalities, like EEG signals, to generate visual attention or emotion tagging for summarization. The plausible cause is the emotion labels to guide video summarization are costly to acquire and demand extensive labels to overcome the lack of nuanced richness for personalization and emotion subtlety. Therefore, this study attempts to overcome the limitations by addressing the problem of expensive human annotations and scalability for affective video summarization. This thesis proposes using EEG as a secondary modality for emotional cues in video summarization. However, the challenge is demonstrating that EEG features retain affective information after converting it into a latent representation. The thesis thus investigates three areas: 1) Emotion recognition by spatiotemporal modeling to prove that the EEG features contain affective information. This preliminary study introduces Regionally-Operated Domain Adversarial Networks (RODAN), an attention-based model for EEG-based emotion classification. 2) Affective semantics analysis by generative modeling, employing Superposition Quantized Variational Autoencoder (SQVAE), based on an orthonormal eigenvector codebook and spatiotemporal transformer as encoder and decoder, to generate EEG latent representations and features to validate the presence of affective information. 3) Affective semantic guided video summarization with deep reinforcement learning proposes EEG-Video Emotion-based Summarization (EVES), a policy-based reinforcement learning model for integrating video and EEG signals for emotion-based summarization. In the first study, RODAN achieved emotion classification accuracies of 60.75% for SEED-IV and 31.84% for DEAP datasets, indicating the presence of affective information. Subsequently, reconstructed EEG signals using SQVAE on MAHNOB-HCI aligned closely with the original signals, and the emotion recognition results with latent representations validated the presence of affective information. Finally, through multimodal pre-training, EVES produced summaries that were 11.4% more coherent and 7.4% more emotion-evoking compared to alternative reinforcement learning models. Overall, this thesis establishes that EEG signals can encode affective information, and multimodal video summarization enhances summaries’ coherency and emotional impact. Doctor of Philosophy 2024-04-12T05:31:02Z 2024-04-12T05:31:02Z 2023 Thesis-Doctor of Philosophy Lew, L. W. C. (2023). Multimodal affective computing for video summarization. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/174824 https://hdl.handle.net/10356/174824 10.32657/10356/174824 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Video summarization EEG emotion recognition
spellingShingle	Computer and Information Science Video summarization EEG emotion recognition Lew, Lincoln Wai Cheong Multimodal affective computing for video summarization
description	This research work attempts to merge affective computing and video summarization, thereby enhancing the latter by integrating cross-disciplinary affective information, termed affective video summarization. Affective video summarization functions by identifying emotionally impactful moments in the video using emotional cues, resulting in summaries to enhance user experiences. Existing visual-based video summarization methods frequently neglect integrating affective information to improve summaries through emotional considerations. Alternatively, they may disregard the visual element and instead utilize alternative modalities, like EEG signals, to generate visual attention or emotion tagging for summarization. The plausible cause is the emotion labels to guide video summarization are costly to acquire and demand extensive labels to overcome the lack of nuanced richness for personalization and emotion subtlety. Therefore, this study attempts to overcome the limitations by addressing the problem of expensive human annotations and scalability for affective video summarization. This thesis proposes using EEG as a secondary modality for emotional cues in video summarization. However, the challenge is demonstrating that EEG features retain affective information after converting it into a latent representation. The thesis thus investigates three areas: 1) Emotion recognition by spatiotemporal modeling to prove that the EEG features contain affective information. This preliminary study introduces Regionally-Operated Domain Adversarial Networks (RODAN), an attention-based model for EEG-based emotion classification. 2) Affective semantics analysis by generative modeling, employing Superposition Quantized Variational Autoencoder (SQVAE), based on an orthonormal eigenvector codebook and spatiotemporal transformer as encoder and decoder, to generate EEG latent representations and features to validate the presence of affective information. 3) Affective semantic guided video summarization with deep reinforcement learning proposes EEG-Video Emotion-based Summarization (EVES), a policy-based reinforcement learning model for integrating video and EEG signals for emotion-based summarization. In the first study, RODAN achieved emotion classification accuracies of 60.75% for SEED-IV and 31.84% for DEAP datasets, indicating the presence of affective information. Subsequently, reconstructed EEG signals using SQVAE on MAHNOB-HCI aligned closely with the original signals, and the emotion recognition results with latent representations validated the presence of affective information. Finally, through multimodal pre-training, EVES produced summaries that were 11.4% more coherent and 7.4% more emotion-evoking compared to alternative reinforcement learning models. Overall, this thesis establishes that EEG signals can encode affective information, and multimodal video summarization enhances summaries’ coherency and emotional impact.
author2	Quek Hiok Chai
author_facet	Quek Hiok Chai Lew, Lincoln Wai Cheong
format	Thesis-Doctor of Philosophy
author	Lew, Lincoln Wai Cheong
author_sort	Lew, Lincoln Wai Cheong
title	Multimodal affective computing for video summarization
title_short	Multimodal affective computing for video summarization
title_full	Multimodal affective computing for video summarization
title_fullStr	Multimodal affective computing for video summarization
title_full_unstemmed	Multimodal affective computing for video summarization
title_sort	multimodal affective computing for video summarization
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/174824
_version_	1800916433846140928

Multimodal affective computing for video summarization

Similar Items