KnowleNet: knowledge fusion network for multimodal sarcasm detection

Sarcasm is a form of communication often used to express contempt or ridicule, where the speaker conveys a message opposite to their true meaning, typically intending to mock or belittle a specific target. Sarcasm detection has gained great attention in the field of natural language processing due t...

Full description

Saved in:
Bibliographic Details
Main Authors: Yue, Tan, Mao, Rui, Wang, Heng, Hu, Zonghai, Cambria, Erik
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171194
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Sarcasm is a form of communication often used to express contempt or ridicule, where the speaker conveys a message opposite to their true meaning, typically intending to mock or belittle a specific target. Sarcasm detection has gained great attention in the field of natural language processing due to the fact that sarcasm is widespread on social media and difficult to detect for machines. While early efforts in sarcasm detection solely relied on textual data, the abundance of multimodal data on social media is also non-negligible. Recent research has focused on multimodal sarcasm detection, where attention mechanisms and graph neural networks were commonly used to identify relevant information in both image and text data. However, these methods may overlook the importance of prior knowledge and cross-modal semantic contrast, which are crucial factors for human sarcasm detection. In this paper, we propose a novel model named KnowleNet that leverages the ConceptNet knowledge base to incorporate prior knowledge and determine image–text relatedness through sample-level and word-level cross-modal semantic similarity detection. Contrastive learning is also introduced to improve the spatial distribution of sarcastic (positive) and non-sarcastic (negative) samples. The proposed model achieves state-of-the-art performance on publicly available benchmark datasets.