KnowleNet: knowledge fusion network for multimodal sarcasm detection

Sarcasm is a form of communication often used to express contempt or ridicule, where the speaker conveys a message opposite to their true meaning, typically intending to mock or belittle a specific target. Sarcasm detection has gained great attention in the field of natural language processing due t...

Full description

Saved in:
Bibliographic Details
Main Authors: Yue, Tan, Mao, Rui, Wang, Heng, Hu, Zonghai, Cambria, Erik
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171194
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-171194
record_format dspace
spelling sg-ntu-dr.10356-1711942023-10-17T04:24:11Z KnowleNet: knowledge fusion network for multimodal sarcasm detection Yue, Tan Mao, Rui Wang, Heng Hu, Zonghai Cambria, Erik School of Computer Science and Engineering Engineering::Computer science and engineering Sarcasm Detection Multimodal Learning Sarcasm is a form of communication often used to express contempt or ridicule, where the speaker conveys a message opposite to their true meaning, typically intending to mock or belittle a specific target. Sarcasm detection has gained great attention in the field of natural language processing due to the fact that sarcasm is widespread on social media and difficult to detect for machines. While early efforts in sarcasm detection solely relied on textual data, the abundance of multimodal data on social media is also non-negligible. Recent research has focused on multimodal sarcasm detection, where attention mechanisms and graph neural networks were commonly used to identify relevant information in both image and text data. However, these methods may overlook the importance of prior knowledge and cross-modal semantic contrast, which are crucial factors for human sarcasm detection. In this paper, we propose a novel model named KnowleNet that leverages the ConceptNet knowledge base to incorporate prior knowledge and determine image–text relatedness through sample-level and word-level cross-modal semantic similarity detection. Contrastive learning is also introduced to improve the spatial distribution of sarcastic (positive) and non-sarcastic (negative) samples. The proposed model achieves state-of-the-art performance on publicly available benchmark datasets. The work described in this paper is supported by the BUPT innovation and entrepreneurship support program (2022-YC-S002) and the China Scholarship Council (CSC) under Grant 202206470036. 2023-10-17T04:24:11Z 2023-10-17T04:24:11Z 2023 Journal Article Yue, T., Mao, R., Wang, H., Hu, Z. & Cambria, E. (2023). KnowleNet: knowledge fusion network for multimodal sarcasm detection. Information Fusion, 100, 101921-. https://dx.doi.org/10.1016/j.inffus.2023.101921 1566-2535 https://hdl.handle.net/10356/171194 10.1016/j.inffus.2023.101921 2-s2.0-85165537676 100 101921 en Information Fusion © 2023 Elsevier B.V. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Sarcasm Detection
Multimodal Learning
spellingShingle Engineering::Computer science and engineering
Sarcasm Detection
Multimodal Learning
Yue, Tan
Mao, Rui
Wang, Heng
Hu, Zonghai
Cambria, Erik
KnowleNet: knowledge fusion network for multimodal sarcasm detection
description Sarcasm is a form of communication often used to express contempt or ridicule, where the speaker conveys a message opposite to their true meaning, typically intending to mock or belittle a specific target. Sarcasm detection has gained great attention in the field of natural language processing due to the fact that sarcasm is widespread on social media and difficult to detect for machines. While early efforts in sarcasm detection solely relied on textual data, the abundance of multimodal data on social media is also non-negligible. Recent research has focused on multimodal sarcasm detection, where attention mechanisms and graph neural networks were commonly used to identify relevant information in both image and text data. However, these methods may overlook the importance of prior knowledge and cross-modal semantic contrast, which are crucial factors for human sarcasm detection. In this paper, we propose a novel model named KnowleNet that leverages the ConceptNet knowledge base to incorporate prior knowledge and determine image–text relatedness through sample-level and word-level cross-modal semantic similarity detection. Contrastive learning is also introduced to improve the spatial distribution of sarcastic (positive) and non-sarcastic (negative) samples. The proposed model achieves state-of-the-art performance on publicly available benchmark datasets.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Yue, Tan
Mao, Rui
Wang, Heng
Hu, Zonghai
Cambria, Erik
format Article
author Yue, Tan
Mao, Rui
Wang, Heng
Hu, Zonghai
Cambria, Erik
author_sort Yue, Tan
title KnowleNet: knowledge fusion network for multimodal sarcasm detection
title_short KnowleNet: knowledge fusion network for multimodal sarcasm detection
title_full KnowleNet: knowledge fusion network for multimodal sarcasm detection
title_fullStr KnowleNet: knowledge fusion network for multimodal sarcasm detection
title_full_unstemmed KnowleNet: knowledge fusion network for multimodal sarcasm detection
title_sort knowlenet: knowledge fusion network for multimodal sarcasm detection
publishDate 2023
url https://hdl.handle.net/10356/171194
_version_ 1781793877504557056