Multimodal misinformation detection by learning from synthetic data with multimodal LLMs

Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizabili...

Full description

Saved in:
Bibliographic Details
Main Authors: ZENG, Fengzhu, LI, Wenqian, GAO, Wei, PANG, Yan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9879
https://ink.library.smu.edu.sg/context/sis_research/article/10879/viewcontent/2024.findings_emnlp.613.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10879
record_format dspace
spelling sg-smu-ink.sis_research-108792025-01-02T09:13:42Z Multimodal misinformation detection by learning from synthetic data with multimodal LLMs ZENG, Fengzhu LI, Wenqian GAO, Wei PANG, Yan Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distribution gap. To address this, we propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods that match synthetic and real-world data distributions. Experiments show that our method enhances the performance of a small MLLM (13B) on real-world fact-checking datasets, enabling it to even surpass GPT-4V. 2024-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9879 info:doi/10.18653/v1/2024.findings-emnlp.613 https://ink.library.smu.edu.sg/context/sis_research/article/10879/viewcontent/2024.findings_emnlp.613.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
spellingShingle Databases and Information Systems
ZENG, Fengzhu
LI, Wenqian
GAO, Wei
PANG, Yan
Multimodal misinformation detection by learning from synthetic data with multimodal LLMs
description Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distribution gap. To address this, we propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods that match synthetic and real-world data distributions. Experiments show that our method enhances the performance of a small MLLM (13B) on real-world fact-checking datasets, enabling it to even surpass GPT-4V.
format text
author ZENG, Fengzhu
LI, Wenqian
GAO, Wei
PANG, Yan
author_facet ZENG, Fengzhu
LI, Wenqian
GAO, Wei
PANG, Yan
author_sort ZENG, Fengzhu
title Multimodal misinformation detection by learning from synthetic data with multimodal LLMs
title_short Multimodal misinformation detection by learning from synthetic data with multimodal LLMs
title_full Multimodal misinformation detection by learning from synthetic data with multimodal LLMs
title_fullStr Multimodal misinformation detection by learning from synthetic data with multimodal LLMs
title_full_unstemmed Multimodal misinformation detection by learning from synthetic data with multimodal LLMs
title_sort multimodal misinformation detection by learning from synthetic data with multimodal llms
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9879
https://ink.library.smu.edu.sg/context/sis_research/article/10879/viewcontent/2024.findings_emnlp.613.pdf
_version_ 1821237271811063808