Mitigating style-image hallucination in large vision language models

LLMs are widely applied across various domains, yet a significant challenge remains—their performance deteriorates sharply in out-of-domain scenarios, often leading to increased hallucinations. Despite its importance, this phenomenon has received limited attention in academic research. To address th...

Full description

Saved in:
Bibliographic Details
Main Author: He, Guoshun
Other Authors: Alex Chichung Kot
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182918
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-182918
record_format dspace
spelling sg-ntu-dr.10356-1829182025-03-10T08:46:20Z Mitigating style-image hallucination in large vision language models He, Guoshun Alex Chichung Kot School of Electrical and Electronic Engineering EACKOT@ntu.edu.sg Engineering Out-of-domain Hallucination Lightweight model LLMs are widely applied across various domains, yet a significant challenge remains—their performance deteriorates sharply in out-of-domain scenarios, often leading to increased hallucinations. Despite its importance, this phenomenon has received limited attention in academic research. To address this, we first construct a benchmark dataset using style transfer techniques and employ it to evaluate the out-of-domain performance of several popular large-scale models. Building upon these findings, we introduce CopeCap, a lightweight image captioning model that leverages collaborative prompting to achieve strong out-of-domain performance without requiring additional training. Master's degree 2025-03-10T02:21:22Z 2025-03-10T02:21:22Z 2025 Thesis-Master by Coursework He, G. (2025). Mitigating style-image hallucination in large vision language models. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182918 https://hdl.handle.net/10356/182918 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering
Out-of-domain
Hallucination
Lightweight model
spellingShingle Engineering
Out-of-domain
Hallucination
Lightweight model
He, Guoshun
Mitigating style-image hallucination in large vision language models
description LLMs are widely applied across various domains, yet a significant challenge remains—their performance deteriorates sharply in out-of-domain scenarios, often leading to increased hallucinations. Despite its importance, this phenomenon has received limited attention in academic research. To address this, we first construct a benchmark dataset using style transfer techniques and employ it to evaluate the out-of-domain performance of several popular large-scale models. Building upon these findings, we introduce CopeCap, a lightweight image captioning model that leverages collaborative prompting to achieve strong out-of-domain performance without requiring additional training.
author2 Alex Chichung Kot
author_facet Alex Chichung Kot
He, Guoshun
format Thesis-Master by Coursework
author He, Guoshun
author_sort He, Guoshun
title Mitigating style-image hallucination in large vision language models
title_short Mitigating style-image hallucination in large vision language models
title_full Mitigating style-image hallucination in large vision language models
title_fullStr Mitigating style-image hallucination in large vision language models
title_full_unstemmed Mitigating style-image hallucination in large vision language models
title_sort mitigating style-image hallucination in large vision language models
publisher Nanyang Technological University
publishDate 2025
url https://hdl.handle.net/10356/182918
_version_ 1826362290923896832