Genixer : Empowering multimodal Large Language Models as a powerful data generator

Multimodal Large Language Models (MLLMs) demonstrate exceptional problem-solving capabilities, but few research studies aim to gauge the ability to generate visual instruction tuning data. This paper proposes to explore the potential of empowering MLLMs to generate data independently without relying...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHAO, Henry Hengyuan, ZHOU, Pan, SHOU, Mike Zheng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Large Language Models LLMs Data generation pipeline Data generators MLLMs Multimodal Large Language Models Artificial Intelligence and Robotics Computer Sciences
Online Access:	https://ink.library.smu.edu.sg/sis_research/9600 https://ink.library.smu.edu.sg/context/sis_research/article/10600/viewcontent/Genixer.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10600
record_format	dspace
spelling	sg-smu-ink.sis_research-106002024-11-23T16:03:11Z Genixer : Empowering multimodal Large Language Models as a powerful data generator ZHAO, Henry Hengyuan ZHOU, Pan SHOU, Mike Zheng Multimodal Large Language Models (MLLMs) demonstrate exceptional problem-solving capabilities, but few research studies aim to gauge the ability to generate visual instruction tuning data. This paper proposes to explore the potential of empowering MLLMs to generate data independently without relying on GPT-4. We introduce Genixer, a comprehensive data generation pipeline consisting of four key steps: (i) instruction data collection, (ii) instruction template design, (iii) empowering MLLMs, and (iv) data generation and filtering. Additionally, we outline two modes of data generation: task-agnostic and task-specific, enabling controllable output. We demonstrate that a synthetic VQA-like dataset trained with LLaVA1.5 enhances performance on 10 out of 12 multimodal benchmarks. Additionally, the grounding MLLM Shikra, when trained with a REC-like synthetic dataset, shows improvements on 7 out of 8 REC datasets. Through experiments and synthetic data analysis, our findings are: (1) current MLLMs can serve as robust data generators without assistance from GPT-4V; (2) MLLMs trained with task-specific datasets can surpass GPT-4V in generating complex instruction tuning data; (3) synthetic datasets enhance performance across various multimodal benchmarks and help mitigate model hallucinations. 2024-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9600 info:doi/10.48550/arXiv.2312.06731 https://ink.library.smu.edu.sg/context/sis_research/article/10600/viewcontent/Genixer.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Large Language Models LLMs Data generation pipeline Data generators MLLMs Multimodal Large Language Models Artificial Intelligence and Robotics Computer Sciences
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Large Language Models LLMs Data generation pipeline Data generators MLLMs Multimodal Large Language Models Artificial Intelligence and Robotics Computer Sciences
spellingShingle	Large Language Models LLMs Data generation pipeline Data generators MLLMs Multimodal Large Language Models Artificial Intelligence and Robotics Computer Sciences ZHAO, Henry Hengyuan ZHOU, Pan SHOU, Mike Zheng Genixer : Empowering multimodal Large Language Models as a powerful data generator
description	Multimodal Large Language Models (MLLMs) demonstrate exceptional problem-solving capabilities, but few research studies aim to gauge the ability to generate visual instruction tuning data. This paper proposes to explore the potential of empowering MLLMs to generate data independently without relying on GPT-4. We introduce Genixer, a comprehensive data generation pipeline consisting of four key steps: (i) instruction data collection, (ii) instruction template design, (iii) empowering MLLMs, and (iv) data generation and filtering. Additionally, we outline two modes of data generation: task-agnostic and task-specific, enabling controllable output. We demonstrate that a synthetic VQA-like dataset trained with LLaVA1.5 enhances performance on 10 out of 12 multimodal benchmarks. Additionally, the grounding MLLM Shikra, when trained with a REC-like synthetic dataset, shows improvements on 7 out of 8 REC datasets. Through experiments and synthetic data analysis, our findings are: (1) current MLLMs can serve as robust data generators without assistance from GPT-4V; (2) MLLMs trained with task-specific datasets can surpass GPT-4V in generating complex instruction tuning data; (3) synthetic datasets enhance performance across various multimodal benchmarks and help mitigate model hallucinations.
format	text
author	ZHAO, Henry Hengyuan ZHOU, Pan SHOU, Mike Zheng
author_facet	ZHAO, Henry Hengyuan ZHOU, Pan SHOU, Mike Zheng
author_sort	ZHAO, Henry Hengyuan
title	Genixer : Empowering multimodal Large Language Models as a powerful data generator
title_short	Genixer : Empowering multimodal Large Language Models as a powerful data generator
title_full	Genixer : Empowering multimodal Large Language Models as a powerful data generator
title_fullStr	Genixer : Empowering multimodal Large Language Models as a powerful data generator
title_full_unstemmed	Genixer : Empowering multimodal Large Language Models as a powerful data generator
title_sort	genixer : empowering multimodal large language models as a powerful data generator
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9600 https://ink.library.smu.edu.sg/context/sis_research/article/10600/viewcontent/Genixer.pdf
_version_	1816859157262761984

Genixer : Empowering multimodal Large Language Models as a powerful data generator

Similar Items