Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition

It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has...

Full description

Saved in:

Bibliographic Details
Main Authors:	LI, Bobo, FEI, Hao, LIAO, Lizi, ZHAO, Yu, TENG, Chong, CHUA, Tat-Seng, Ji, Donghong, LI, Fei
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	emotion recognition multimodal learning Databases and Information Systems Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/8485 https://ink.library.smu.edu.sg/context/sis_research/article/9488/viewcontent/Revist_Disentanglement_Emotion_pv.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9488
record_format	dspace
spelling	sg-smu-ink.sis_research-94882024-01-04T09:03:01Z Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition LI, Bobo FEI, Hao LIAO, Lizi ZHAO, Yu TENG, Chong CHUA, Tat-Seng Ji, Donghong LI, Fei It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has been proposed for securing better task performance. Most existing works treat MM-ERC as a standard multimodal classification problem and perform multimodal feature disentanglement and fusion for maximizing feature utility. Yet after revisiting the characteristic of MM-ERC, we argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps. In this work, we target further pushing the task performance by taking full consideration of the above insights. On the one hand, during feature disentanglement, based on the contrastive learning technique, we devise a Dual-level Disentanglement Mechanism (DDM) to decouple the features into both the modality space and utterance space. On the other hand, during the feature fusion stage, we propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism (CRM) for multimodal and context integration, respectively. They together schedule the proper integrations of multimodal and context features. Specifically, CFM explicitly manages the multimodal feature contributions dynamically, while CRM flexibly coordinates the introduction of dialogue contexts. On two public MM-ERC datasets, our system achieves new state-of-the-art performance consistently. Further analyses demonstrate that all our proposed mechanisms greatly facilitate the MM-ERC task by making full use of the multimodal and context features adaptively. Note that our proposed methods have the great potential to facilitate a broader range of other conversational multimodal tasks. 2023-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8485 info:doi/10.1145/3581783.3612053 https://ink.library.smu.edu.sg/context/sis_research/article/9488/viewcontent/Revist_Disentanglement_Emotion_pv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University emotion recognition multimodal learning Databases and Information Systems Graphics and Human Computer Interfaces
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	emotion recognition multimodal learning Databases and Information Systems Graphics and Human Computer Interfaces
spellingShingle	emotion recognition multimodal learning Databases and Information Systems Graphics and Human Computer Interfaces LI, Bobo FEI, Hao LIAO, Lizi ZHAO, Yu TENG, Chong CHUA, Tat-Seng Ji, Donghong LI, Fei Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition
description	It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has been proposed for securing better task performance. Most existing works treat MM-ERC as a standard multimodal classification problem and perform multimodal feature disentanglement and fusion for maximizing feature utility. Yet after revisiting the characteristic of MM-ERC, we argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps. In this work, we target further pushing the task performance by taking full consideration of the above insights. On the one hand, during feature disentanglement, based on the contrastive learning technique, we devise a Dual-level Disentanglement Mechanism (DDM) to decouple the features into both the modality space and utterance space. On the other hand, during the feature fusion stage, we propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism (CRM) for multimodal and context integration, respectively. They together schedule the proper integrations of multimodal and context features. Specifically, CFM explicitly manages the multimodal feature contributions dynamically, while CRM flexibly coordinates the introduction of dialogue contexts. On two public MM-ERC datasets, our system achieves new state-of-the-art performance consistently. Further analyses demonstrate that all our proposed mechanisms greatly facilitate the MM-ERC task by making full use of the multimodal and context features adaptively. Note that our proposed methods have the great potential to facilitate a broader range of other conversational multimodal tasks.
format	text
author	LI, Bobo FEI, Hao LIAO, Lizi ZHAO, Yu TENG, Chong CHUA, Tat-Seng Ji, Donghong LI, Fei
author_facet	LI, Bobo FEI, Hao LIAO, Lizi ZHAO, Yu TENG, Chong CHUA, Tat-Seng Ji, Donghong LI, Fei
author_sort	LI, Bobo
title	Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition
title_short	Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition
title_full	Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition
title_fullStr	Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition
title_full_unstemmed	Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition
title_sort	revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/8485 https://ink.library.smu.edu.sg/context/sis_research/article/9488/viewcontent/Revist_Disentanglement_Emotion_pv.pdf
_version_	1787590778375110656

Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition

Similar Items