Multimodal misinformation detection by learning from synthetic data with multimodal LLMs
Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizabili...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9879 https://ink.library.smu.edu.sg/context/sis_research/article/10879/viewcontent/2024.findings_emnlp.613.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-10879 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-108792025-01-02T09:13:42Z Multimodal misinformation detection by learning from synthetic data with multimodal LLMs ZENG, Fengzhu LI, Wenqian GAO, Wei PANG, Yan Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distribution gap. To address this, we propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods that match synthetic and real-world data distributions. Experiments show that our method enhances the performance of a small MLLM (13B) on real-world fact-checking datasets, enabling it to even surpass GPT-4V. 2024-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9879 info:doi/10.18653/v1/2024.findings-emnlp.613 https://ink.library.smu.edu.sg/context/sis_research/article/10879/viewcontent/2024.findings_emnlp.613.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems |
spellingShingle |
Databases and Information Systems ZENG, Fengzhu LI, Wenqian GAO, Wei PANG, Yan Multimodal misinformation detection by learning from synthetic data with multimodal LLMs |
description |
Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distribution gap. To address this, we propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods that match synthetic and real-world data distributions. Experiments show that our method enhances the performance of a small MLLM (13B) on real-world fact-checking datasets, enabling it to even surpass GPT-4V. |
format |
text |
author |
ZENG, Fengzhu LI, Wenqian GAO, Wei PANG, Yan |
author_facet |
ZENG, Fengzhu LI, Wenqian GAO, Wei PANG, Yan |
author_sort |
ZENG, Fengzhu |
title |
Multimodal misinformation detection by learning from synthetic data with multimodal LLMs |
title_short |
Multimodal misinformation detection by learning from synthetic data with multimodal LLMs |
title_full |
Multimodal misinformation detection by learning from synthetic data with multimodal LLMs |
title_fullStr |
Multimodal misinformation detection by learning from synthetic data with multimodal LLMs |
title_full_unstemmed |
Multimodal misinformation detection by learning from synthetic data with multimodal LLMs |
title_sort |
multimodal misinformation detection by learning from synthetic data with multimodal llms |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2024 |
url |
https://ink.library.smu.edu.sg/sis_research/9879 https://ink.library.smu.edu.sg/context/sis_research/article/10879/viewcontent/2024.findings_emnlp.613.pdf |
_version_ |
1821237271811063808 |