Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites
Large language models (LLMs) have shown remarkable performance in natural language processing (NLP) tasks. To comprehend and execute diverse human instructions over image data, instruction-tuned large vision-language models (LVLMs) have been introduced. However, LVLMs may suffer from different types...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8750 https://ink.library.smu.edu.sg/context/sis_research/article/9753/viewcontent/MitigatingFine_GrainedHallucination_av.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9753 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-97532024-05-03T07:00:47Z Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites WANG, Lei HE, Jiabang LI, Shenshen LIU, Ning LIM, Ee-peng Large language models (LLMs) have shown remarkable performance in natural language processing (NLP) tasks. To comprehend and execute diverse human instructions over image data, instruction-tuned large vision-language models (LVLMs) have been introduced. However, LVLMs may suffer from different types of object hallucinations. Nevertheless, LVLMs are evaluated for coarse-grained object hallucinations only (i.e., generated objects non-existent in the input image). The fine-grained object attributes and behaviors non-existent in the image may still be generated but not measured by the current evaluation methods. In this paper, we thus focus on reducing fine-grained hallucinations of LVLMs. We propose ReCaption, a framework that consists of two components: rewriting captions using ChatGPT and fine-tuning the instruction-tuned LVLMs on the rewritten captions. We also propose a fine-grained probing-based evaluation method named Fine-Grained Object Hallucination Evaluation (FGHE). Our experiment results demonstrate that ReCaption effectively reduces fine-grained object hallucination for different LVLM options and improves their text generation quality. The code can be found at https://github.com/Anonymousanoy/FOHE. 2024-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8750 info:doi/10.1007/978-3-031-53302-0_3 https://ink.library.smu.edu.sg/context/sis_research/article/9753/viewcontent/MitigatingFine_GrainedHallucination_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Hallucination Mitigation Large Vision-Language Models Artificial Intelligence and Robotics Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Hallucination Mitigation Large Vision-Language Models Artificial Intelligence and Robotics Databases and Information Systems |
spellingShingle |
Hallucination Mitigation Large Vision-Language Models Artificial Intelligence and Robotics Databases and Information Systems WANG, Lei HE, Jiabang LI, Shenshen LIU, Ning LIM, Ee-peng Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites |
description |
Large language models (LLMs) have shown remarkable performance in natural language processing (NLP) tasks. To comprehend and execute diverse human instructions over image data, instruction-tuned large vision-language models (LVLMs) have been introduced. However, LVLMs may suffer from different types of object hallucinations. Nevertheless, LVLMs are evaluated for coarse-grained object hallucinations only (i.e., generated objects non-existent in the input image). The fine-grained object attributes and behaviors non-existent in the image may still be generated but not measured by the current evaluation methods. In this paper, we thus focus on reducing fine-grained hallucinations of LVLMs. We propose ReCaption, a framework that consists of two components: rewriting captions using ChatGPT and fine-tuning the instruction-tuned LVLMs on the rewritten captions. We also propose a fine-grained probing-based evaluation method named Fine-Grained Object Hallucination Evaluation (FGHE). Our experiment results demonstrate that ReCaption effectively reduces fine-grained object hallucination for different LVLM options and improves their text generation quality. The code can be found at https://github.com/Anonymousanoy/FOHE. |
format |
text |
author |
WANG, Lei HE, Jiabang LI, Shenshen LIU, Ning LIM, Ee-peng |
author_facet |
WANG, Lei HE, Jiabang LI, Shenshen LIU, Ning LIM, Ee-peng |
author_sort |
WANG, Lei |
title |
Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites |
title_short |
Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites |
title_full |
Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites |
title_fullStr |
Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites |
title_full_unstemmed |
Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites |
title_sort |
mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2024 |
url |
https://ink.library.smu.edu.sg/sis_research/8750 https://ink.library.smu.edu.sg/context/sis_research/article/9753/viewcontent/MitigatingFine_GrainedHallucination_av.pdf |
_version_ |
1814047501159759872 |