KnowleNet: knowledge fusion network for multimodal sarcasm detection
Sarcasm is a form of communication often used to express contempt or ridicule, where the speaker conveys a message opposite to their true meaning, typically intending to mock or belittle a specific target. Sarcasm detection has gained great attention in the field of natural language processing due t...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/171194 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-171194 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1711942023-10-17T04:24:11Z KnowleNet: knowledge fusion network for multimodal sarcasm detection Yue, Tan Mao, Rui Wang, Heng Hu, Zonghai Cambria, Erik School of Computer Science and Engineering Engineering::Computer science and engineering Sarcasm Detection Multimodal Learning Sarcasm is a form of communication often used to express contempt or ridicule, where the speaker conveys a message opposite to their true meaning, typically intending to mock or belittle a specific target. Sarcasm detection has gained great attention in the field of natural language processing due to the fact that sarcasm is widespread on social media and difficult to detect for machines. While early efforts in sarcasm detection solely relied on textual data, the abundance of multimodal data on social media is also non-negligible. Recent research has focused on multimodal sarcasm detection, where attention mechanisms and graph neural networks were commonly used to identify relevant information in both image and text data. However, these methods may overlook the importance of prior knowledge and cross-modal semantic contrast, which are crucial factors for human sarcasm detection. In this paper, we propose a novel model named KnowleNet that leverages the ConceptNet knowledge base to incorporate prior knowledge and determine image–text relatedness through sample-level and word-level cross-modal semantic similarity detection. Contrastive learning is also introduced to improve the spatial distribution of sarcastic (positive) and non-sarcastic (negative) samples. The proposed model achieves state-of-the-art performance on publicly available benchmark datasets. The work described in this paper is supported by the BUPT innovation and entrepreneurship support program (2022-YC-S002) and the China Scholarship Council (CSC) under Grant 202206470036. 2023-10-17T04:24:11Z 2023-10-17T04:24:11Z 2023 Journal Article Yue, T., Mao, R., Wang, H., Hu, Z. & Cambria, E. (2023). KnowleNet: knowledge fusion network for multimodal sarcasm detection. Information Fusion, 100, 101921-. https://dx.doi.org/10.1016/j.inffus.2023.101921 1566-2535 https://hdl.handle.net/10356/171194 10.1016/j.inffus.2023.101921 2-s2.0-85165537676 100 101921 en Information Fusion © 2023 Elsevier B.V. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Sarcasm Detection Multimodal Learning |
spellingShingle |
Engineering::Computer science and engineering Sarcasm Detection Multimodal Learning Yue, Tan Mao, Rui Wang, Heng Hu, Zonghai Cambria, Erik KnowleNet: knowledge fusion network for multimodal sarcasm detection |
description |
Sarcasm is a form of communication often used to express contempt or ridicule, where the speaker conveys a message opposite to their true meaning, typically intending to mock or belittle a specific target. Sarcasm detection has gained great attention in the field of natural language processing due to the fact that sarcasm is widespread on social media and difficult to detect for machines. While early efforts in sarcasm detection solely relied on textual data, the abundance of multimodal data on social media is also non-negligible. Recent research has focused on multimodal sarcasm detection, where attention mechanisms and graph neural networks were commonly used to identify relevant information in both image and text data. However, these methods may overlook the importance of prior knowledge and cross-modal semantic contrast, which are crucial factors for human sarcasm detection. In this paper, we propose a novel model named KnowleNet that leverages the ConceptNet knowledge base to incorporate prior knowledge and determine image–text relatedness through sample-level and word-level cross-modal semantic similarity detection. Contrastive learning is also introduced to improve the spatial distribution of sarcastic (positive) and non-sarcastic (negative) samples. The proposed model achieves state-of-the-art performance on publicly available benchmark datasets. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Yue, Tan Mao, Rui Wang, Heng Hu, Zonghai Cambria, Erik |
format |
Article |
author |
Yue, Tan Mao, Rui Wang, Heng Hu, Zonghai Cambria, Erik |
author_sort |
Yue, Tan |
title |
KnowleNet: knowledge fusion network for multimodal sarcasm detection |
title_short |
KnowleNet: knowledge fusion network for multimodal sarcasm detection |
title_full |
KnowleNet: knowledge fusion network for multimodal sarcasm detection |
title_fullStr |
KnowleNet: knowledge fusion network for multimodal sarcasm detection |
title_full_unstemmed |
KnowleNet: knowledge fusion network for multimodal sarcasm detection |
title_sort |
knowlenet: knowledge fusion network for multimodal sarcasm detection |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/171194 |
_version_ |
1781793877504557056 |