LOVA3 : Learning to visual question answering, asking and assessment

Question answering, asking, and assessment are three innate human traits crucial for understanding the world and acquiring knowledge. By enhancing these capabilities, humans can more effectively utilize data, leading to better comprehension and learning outcomes. Current Multimodal Large Language Mo...

全面介紹

Saved in:

書目詳細資料
Main Authors:	ZHAO, Henry Hengyuan, ZHOU, Pan, GAO, Difei, SHOU, BAI, SHOU, Mike Zheng
格式:	text
語言:	English
出版:	Institutional Knowledge at Singapore Management University 2024
主題:	Multimodal large language models Questioning and assessment Machine learning Natural language processing Artificial Intelligence and Robotics Computer Sciences
在線閱讀:	https://ink.library.smu.edu.sg/sis_research/9730 https://ink.library.smu.edu.sg/context/sis_research/article/10730/viewcontent/LoVA.pdf
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Singapore Management University
語言:	English

id	sg-smu-ink.sis_research-10730
record_format	dspace
spelling	sg-smu-ink.sis_research-107302024-12-16T06:54:55Z LOVA3 : Learning to visual question answering, asking and assessment ZHAO, Henry Hengyuan ZHOU, Pan GAO, Difei SHOU, BAI SHOU, Mike Zheng Question answering, asking, and assessment are three innate human traits crucial for understanding the world and acquiring knowledge. By enhancing these capabilities, humans can more effectively utilize data, leading to better comprehension and learning outcomes. Current Multimodal Large Language Models (MLLMs) primarily focus on question answering, often neglecting the full potential of questioning and assessment skills. Inspired by the human learning mechanism, we introduce LOVA3 , an innovative framework named “Learning tO Visual question Answering, Asking and Assessment,” designed to equip MLLMs with these additional capabilities. Our approach involves the creation of two supplementary training tasks GenQA and EvalQA, aiming at fostering the skills of asking and assessing questions in the context of images. To develop the questioning ability, we compile a comprehensive set of multimodal foundational tasks. For assessment, we introduce a new benchmark called EvalQABench, comprising 64,000 training samples (split evenly between positive and negative samples) and 5,000 validation and testing samples. We posit that enhancing MLLMs with the capabilities to answer, ask, and assess questions will enhance their multimodal comprehension, ultimately improving overall performance. To validate this hypothesis, we train MLLMs using the LOVA3 framework and evaluate them on a range of multimodal datasets and benchmarks. Our results demonstrate consistent performance gains, underscoring the critical role of these additional tasks in fostering comprehensive intelligence in MLLMs. 2024-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9730 https://ink.library.smu.edu.sg/context/sis_research/article/10730/viewcontent/LoVA.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Multimodal large language models Questioning and assessment Machine learning Natural language processing Artificial Intelligence and Robotics Computer Sciences
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Multimodal large language models Questioning and assessment Machine learning Natural language processing Artificial Intelligence and Robotics Computer Sciences
spellingShingle	Multimodal large language models Questioning and assessment Machine learning Natural language processing Artificial Intelligence and Robotics Computer Sciences ZHAO, Henry Hengyuan ZHOU, Pan GAO, Difei SHOU, BAI SHOU, Mike Zheng LOVA3 : Learning to visual question answering, asking and assessment
description	Question answering, asking, and assessment are three innate human traits crucial for understanding the world and acquiring knowledge. By enhancing these capabilities, humans can more effectively utilize data, leading to better comprehension and learning outcomes. Current Multimodal Large Language Models (MLLMs) primarily focus on question answering, often neglecting the full potential of questioning and assessment skills. Inspired by the human learning mechanism, we introduce LOVA3 , an innovative framework named “Learning tO Visual question Answering, Asking and Assessment,” designed to equip MLLMs with these additional capabilities. Our approach involves the creation of two supplementary training tasks GenQA and EvalQA, aiming at fostering the skills of asking and assessing questions in the context of images. To develop the questioning ability, we compile a comprehensive set of multimodal foundational tasks. For assessment, we introduce a new benchmark called EvalQABench, comprising 64,000 training samples (split evenly between positive and negative samples) and 5,000 validation and testing samples. We posit that enhancing MLLMs with the capabilities to answer, ask, and assess questions will enhance their multimodal comprehension, ultimately improving overall performance. To validate this hypothesis, we train MLLMs using the LOVA3 framework and evaluate them on a range of multimodal datasets and benchmarks. Our results demonstrate consistent performance gains, underscoring the critical role of these additional tasks in fostering comprehensive intelligence in MLLMs.
format	text
author	ZHAO, Henry Hengyuan ZHOU, Pan GAO, Difei SHOU, BAI SHOU, Mike Zheng
author_facet	ZHAO, Henry Hengyuan ZHOU, Pan GAO, Difei SHOU, BAI SHOU, Mike Zheng
author_sort	ZHAO, Henry Hengyuan
title	LOVA3 : Learning to visual question answering, asking and assessment
title_short	LOVA3 : Learning to visual question answering, asking and assessment
title_full	LOVA3 : Learning to visual question answering, asking and assessment
title_fullStr	LOVA3 : Learning to visual question answering, asking and assessment
title_full_unstemmed	LOVA3 : Learning to visual question answering, asking and assessment
title_sort	lova3 : learning to visual question answering, asking and assessment
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9730 https://ink.library.smu.edu.sg/context/sis_research/article/10730/viewcontent/LoVA.pdf
_version_	1819113121498791936

LOVA3 : Learning to visual question answering, asking and assessment

相似書籍