Mitigating style-image hallucination in large vision language models
LLMs are widely applied across various domains, yet a significant challenge remains—their performance deteriorates sharply in out-of-domain scenarios, often leading to increased hallucinations. Despite its importance, this phenomenon has received limited attention in academic research. To address th...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/182918 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-182918 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1829182025-03-10T08:46:20Z Mitigating style-image hallucination in large vision language models He, Guoshun Alex Chichung Kot School of Electrical and Electronic Engineering EACKOT@ntu.edu.sg Engineering Out-of-domain Hallucination Lightweight model LLMs are widely applied across various domains, yet a significant challenge remains—their performance deteriorates sharply in out-of-domain scenarios, often leading to increased hallucinations. Despite its importance, this phenomenon has received limited attention in academic research. To address this, we first construct a benchmark dataset using style transfer techniques and employ it to evaluate the out-of-domain performance of several popular large-scale models. Building upon these findings, we introduce CopeCap, a lightweight image captioning model that leverages collaborative prompting to achieve strong out-of-domain performance without requiring additional training. Master's degree 2025-03-10T02:21:22Z 2025-03-10T02:21:22Z 2025 Thesis-Master by Coursework He, G. (2025). Mitigating style-image hallucination in large vision language models. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182918 https://hdl.handle.net/10356/182918 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering Out-of-domain Hallucination Lightweight model |
spellingShingle |
Engineering Out-of-domain Hallucination Lightweight model He, Guoshun Mitigating style-image hallucination in large vision language models |
description |
LLMs are widely applied across various domains, yet a significant challenge remains—their performance deteriorates sharply in out-of-domain scenarios, often leading to increased hallucinations. Despite its importance, this phenomenon has received limited attention in academic research. To address this, we first construct a benchmark dataset using style transfer techniques and employ it to evaluate the out-of-domain performance of several popular large-scale models. Building upon these findings, we introduce CopeCap, a lightweight image captioning model that leverages collaborative prompting to achieve strong out-of-domain performance without requiring additional training. |
author2 |
Alex Chichung Kot |
author_facet |
Alex Chichung Kot He, Guoshun |
format |
Thesis-Master by Coursework |
author |
He, Guoshun |
author_sort |
He, Guoshun |
title |
Mitigating style-image hallucination in large vision language models |
title_short |
Mitigating style-image hallucination in large vision language models |
title_full |
Mitigating style-image hallucination in large vision language models |
title_fullStr |
Mitigating style-image hallucination in large vision language models |
title_full_unstemmed |
Mitigating style-image hallucination in large vision language models |
title_sort |
mitigating style-image hallucination in large vision language models |
publisher |
Nanyang Technological University |
publishDate |
2025 |
url |
https://hdl.handle.net/10356/182918 |
_version_ |
1826362290923896832 |