VISUAL QUESTION ANSWERING REASONING SYNTHETIC DATA GENERATION USING LARGE VISION LANGUAGE MODEL

In the realm of Visual Question Answering (VQA), a substantial amount of data with reasoning aspects is required to ensure the development of systems capable of generating rational and reliable outputs. However, the large resources needed to create VQA reasoning data have driven the exploration o...

Full description

Saved in:

Bibliographic Details
Main Author:	Amadeus Irawan, Patrick
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/86165
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:86165
spelling	id-itb.:861652024-09-15T05:27:35ZVISUAL QUESTION ANSWERING REASONING SYNTHETIC DATA GENERATION USING LARGE VISION LANGUAGE MODEL Amadeus Irawan, Patrick Indonesia Final Project Synthetic data generation, VQA reasoning, LVLM, LLaVA, prompt. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/86165 In the realm of Visual Question Answering (VQA), a substantial amount of data with reasoning aspects is required to ensure the development of systems capable of generating rational and reliable outputs. However, the large resources needed to create VQA reasoning data have driven the exploration of more efficient data creation methods. This thesis aims to explore the use of Large Vision Language Models (LVLM) to generate high-quality synthetic VQA reasoning data more efficiently. Experiments were conducted by combining three variants of the LLaVA model with three different prompting techniques. The first approach utilized a single naïve instruction, the second employed an ensembling technique on outputs from various more complex instructions, and the third used naive instructions complemented by object location annotations within the images. The synthetic data was evaluated in terms of quality and structural similarity to human-generated data. The data generation process using the developed system was up to 19.8 times more time-efficient, with only a 4% decrease in quality compared to human-created data. The findings highlight the potential of leveraging LVLM with appropriate prompting techniques to produce high-quality VQA reasoning data. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	In the realm of Visual Question Answering (VQA), a substantial amount of data with reasoning aspects is required to ensure the development of systems capable of generating rational and reliable outputs. However, the large resources needed to create VQA reasoning data have driven the exploration of more efficient data creation methods. This thesis aims to explore the use of Large Vision Language Models (LVLM) to generate high-quality synthetic VQA reasoning data more efficiently. Experiments were conducted by combining three variants of the LLaVA model with three different prompting techniques. The first approach utilized a single naïve instruction, the second employed an ensembling technique on outputs from various more complex instructions, and the third used naive instructions complemented by object location annotations within the images. The synthetic data was evaluated in terms of quality and structural similarity to human-generated data. The data generation process using the developed system was up to 19.8 times more time-efficient, with only a 4% decrease in quality compared to human-created data. The findings highlight the potential of leveraging LVLM with appropriate prompting techniques to produce high-quality VQA reasoning data.
format	Final Project
author	Amadeus Irawan, Patrick
spellingShingle	Amadeus Irawan, Patrick VISUAL QUESTION ANSWERING REASONING SYNTHETIC DATA GENERATION USING LARGE VISION LANGUAGE MODEL
author_facet	Amadeus Irawan, Patrick
author_sort	Amadeus Irawan, Patrick
title	VISUAL QUESTION ANSWERING REASONING SYNTHETIC DATA GENERATION USING LARGE VISION LANGUAGE MODEL
title_short	VISUAL QUESTION ANSWERING REASONING SYNTHETIC DATA GENERATION USING LARGE VISION LANGUAGE MODEL
title_full	VISUAL QUESTION ANSWERING REASONING SYNTHETIC DATA GENERATION USING LARGE VISION LANGUAGE MODEL
title_fullStr	VISUAL QUESTION ANSWERING REASONING SYNTHETIC DATA GENERATION USING LARGE VISION LANGUAGE MODEL
title_full_unstemmed	VISUAL QUESTION ANSWERING REASONING SYNTHETIC DATA GENERATION USING LARGE VISION LANGUAGE MODEL
title_sort	visual question answering reasoning synthetic data generation using large vision language model
url	https://digilib.itb.ac.id/gdl/view/86165
_version_	1822283344694476800

VISUAL QUESTION ANSWERING REASONING SYNTHETIC DATA GENERATION USING LARGE VISION LANGUAGE MODEL

Similar Items