Fusing pairwise modalities for emotion recognition in conversations
Multimodal fusion has the potential to significantly enhance model performance in the domain of Emotion Recognition in Conversations (ERC) by efficiently integrating information from diverse modalities. However, existing methods face challenges as they directly integrate information from different m...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175811 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-175811 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1758112024-05-07T02:09:08Z Fusing pairwise modalities for emotion recognition in conversations Fan, Chunxiao Lin, Jie Mao, Rui Cambria, Erik School of Computer Science and Engineering Computer and Information Science Multimodal Feature fusion Multimodal fusion has the potential to significantly enhance model performance in the domain of Emotion Recognition in Conversations (ERC) by efficiently integrating information from diverse modalities. However, existing methods face challenges as they directly integrate information from different modalities, making it difficult to assess the individual impact of each modality during training and to capture nuanced fusion. To deal with it, we propose a novel framework named Fusing Pairwise Modalities for ERC. In this proposed method, the pairwise fusion technique is incorporated into multimodal fusion to enhance model performance, which enables each modality to contribute unique information, thereby facilitating a more comprehensive understanding of the emotional context. Additionally, a designed density loss is applied to characterise fused feature density, with a specific focus on mitigating redundancy in pairwise fusion methods. The density loss penalises feature density during training, contributing to a more efficient and effective fusion process. To validate the proposed framework, we conduct comprehensive experiments on two benchmark datasets, namely IEMOCAP and MELD. The results demonstrate the superior performance of our approach compared to state-of-the-art methods, indicating its effectiveness in addressing challenges related to multimodal fusion in the context of ERC. This work is supported in part by the National Key Research and Development Program of China (No. 2022YFC3803200), the Natural Science Foundation of China (No. 61802105), the University Synergy Innovation Program of Anhui Province, China (No. GXXT-2021-005 and GXXT-2022–033), and the Fundamental Research Funds for the Central Universities, China (No. JZ2022HGTB0250 and PA2023IISL0096). 2024-05-07T02:09:08Z 2024-05-07T02:09:08Z 2024 Journal Article Fan, C., Lin, J., Mao, R. & Cambria, E. (2024). Fusing pairwise modalities for emotion recognition in conversations. Information Fusion, 106, 102306-. https://dx.doi.org/10.1016/j.inffus.2024.102306 1566-2535 https://hdl.handle.net/10356/175811 10.1016/j.inffus.2024.102306 2-s2.0-85185399986 106 102306 en Information Fusion © 2024 Elsevier B.V. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Multimodal Feature fusion |
spellingShingle |
Computer and Information Science Multimodal Feature fusion Fan, Chunxiao Lin, Jie Mao, Rui Cambria, Erik Fusing pairwise modalities for emotion recognition in conversations |
description |
Multimodal fusion has the potential to significantly enhance model performance in the domain of Emotion Recognition in Conversations (ERC) by efficiently integrating information from diverse modalities. However, existing methods face challenges as they directly integrate information from different modalities, making it difficult to assess the individual impact of each modality during training and to capture nuanced fusion. To deal with it, we propose a novel framework named Fusing Pairwise Modalities for ERC. In this proposed method, the pairwise fusion technique is incorporated into multimodal fusion to enhance model performance, which enables each modality to contribute unique information, thereby facilitating a more comprehensive understanding of the emotional context. Additionally, a designed density loss is applied to characterise fused feature density, with a specific focus on mitigating redundancy in pairwise fusion methods. The density loss penalises feature density during training, contributing to a more efficient and effective fusion process. To validate the proposed framework, we conduct comprehensive experiments on two benchmark datasets, namely IEMOCAP and MELD. The results demonstrate the superior performance of our approach compared to state-of-the-art methods, indicating its effectiveness in addressing challenges related to multimodal fusion in the context of ERC. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Fan, Chunxiao Lin, Jie Mao, Rui Cambria, Erik |
format |
Article |
author |
Fan, Chunxiao Lin, Jie Mao, Rui Cambria, Erik |
author_sort |
Fan, Chunxiao |
title |
Fusing pairwise modalities for emotion recognition in conversations |
title_short |
Fusing pairwise modalities for emotion recognition in conversations |
title_full |
Fusing pairwise modalities for emotion recognition in conversations |
title_fullStr |
Fusing pairwise modalities for emotion recognition in conversations |
title_full_unstemmed |
Fusing pairwise modalities for emotion recognition in conversations |
title_sort |
fusing pairwise modalities for emotion recognition in conversations |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/175811 |
_version_ |
1814047307033739264 |