Fusing pairwise modalities for emotion recognition in conversations

Multimodal fusion has the potential to significantly enhance model performance in the domain of Emotion Recognition in Conversations (ERC) by efficiently integrating information from diverse modalities. However, existing methods face challenges as they directly integrate information from different m...

Full description

Saved in:
Bibliographic Details
Main Authors: Fan, Chunxiao, Lin, Jie, Mao, Rui, Cambria, Erik
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175811
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-175811
record_format dspace
spelling sg-ntu-dr.10356-1758112024-05-07T02:09:08Z Fusing pairwise modalities for emotion recognition in conversations Fan, Chunxiao Lin, Jie Mao, Rui Cambria, Erik School of Computer Science and Engineering Computer and Information Science Multimodal Feature fusion Multimodal fusion has the potential to significantly enhance model performance in the domain of Emotion Recognition in Conversations (ERC) by efficiently integrating information from diverse modalities. However, existing methods face challenges as they directly integrate information from different modalities, making it difficult to assess the individual impact of each modality during training and to capture nuanced fusion. To deal with it, we propose a novel framework named Fusing Pairwise Modalities for ERC. In this proposed method, the pairwise fusion technique is incorporated into multimodal fusion to enhance model performance, which enables each modality to contribute unique information, thereby facilitating a more comprehensive understanding of the emotional context. Additionally, a designed density loss is applied to characterise fused feature density, with a specific focus on mitigating redundancy in pairwise fusion methods. The density loss penalises feature density during training, contributing to a more efficient and effective fusion process. To validate the proposed framework, we conduct comprehensive experiments on two benchmark datasets, namely IEMOCAP and MELD. The results demonstrate the superior performance of our approach compared to state-of-the-art methods, indicating its effectiveness in addressing challenges related to multimodal fusion in the context of ERC. This work is supported in part by the National Key Research and Development Program of China (No. 2022YFC3803200), the Natural Science Foundation of China (No. 61802105), the University Synergy Innovation Program of Anhui Province, China (No. GXXT-2021-005 and GXXT-2022–033), and the Fundamental Research Funds for the Central Universities, China (No. JZ2022HGTB0250 and PA2023IISL0096). 2024-05-07T02:09:08Z 2024-05-07T02:09:08Z 2024 Journal Article Fan, C., Lin, J., Mao, R. & Cambria, E. (2024). Fusing pairwise modalities for emotion recognition in conversations. Information Fusion, 106, 102306-. https://dx.doi.org/10.1016/j.inffus.2024.102306 1566-2535 https://hdl.handle.net/10356/175811 10.1016/j.inffus.2024.102306 2-s2.0-85185399986 106 102306 en Information Fusion © 2024 Elsevier B.V. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Multimodal
Feature fusion
spellingShingle Computer and Information Science
Multimodal
Feature fusion
Fan, Chunxiao
Lin, Jie
Mao, Rui
Cambria, Erik
Fusing pairwise modalities for emotion recognition in conversations
description Multimodal fusion has the potential to significantly enhance model performance in the domain of Emotion Recognition in Conversations (ERC) by efficiently integrating information from diverse modalities. However, existing methods face challenges as they directly integrate information from different modalities, making it difficult to assess the individual impact of each modality during training and to capture nuanced fusion. To deal with it, we propose a novel framework named Fusing Pairwise Modalities for ERC. In this proposed method, the pairwise fusion technique is incorporated into multimodal fusion to enhance model performance, which enables each modality to contribute unique information, thereby facilitating a more comprehensive understanding of the emotional context. Additionally, a designed density loss is applied to characterise fused feature density, with a specific focus on mitigating redundancy in pairwise fusion methods. The density loss penalises feature density during training, contributing to a more efficient and effective fusion process. To validate the proposed framework, we conduct comprehensive experiments on two benchmark datasets, namely IEMOCAP and MELD. The results demonstrate the superior performance of our approach compared to state-of-the-art methods, indicating its effectiveness in addressing challenges related to multimodal fusion in the context of ERC.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Fan, Chunxiao
Lin, Jie
Mao, Rui
Cambria, Erik
format Article
author Fan, Chunxiao
Lin, Jie
Mao, Rui
Cambria, Erik
author_sort Fan, Chunxiao
title Fusing pairwise modalities for emotion recognition in conversations
title_short Fusing pairwise modalities for emotion recognition in conversations
title_full Fusing pairwise modalities for emotion recognition in conversations
title_fullStr Fusing pairwise modalities for emotion recognition in conversations
title_full_unstemmed Fusing pairwise modalities for emotion recognition in conversations
title_sort fusing pairwise modalities for emotion recognition in conversations
publishDate 2024
url https://hdl.handle.net/10356/175811
_version_ 1814047307033739264