T-SciQ: Teaching multimodal Chain-of-Thought reasoning via large language model signals for science question answering

Large Language Models (LLMs) have recently demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. They have also shown the ability to perform chain-of-thought (CoT) reasoning to solve complex problems. Recent studies have explored CoT reasoning in complex multimodal...

Full description

Saved in:

Bibliographic Details
Main Authors:	WANG, Lei, HU, Yi, HE, Jiabang, XU, Xing, LIU, Ning, LIU, Hui, SHEN, Heng Tao
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Complex problems High quality Language model Language processing Model signals Multi-modal Multimodal chains Natural languages Artificial Intelligence and Robotics Databases and Information Systems Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/8756 https://ink.library.smu.edu.sg/context/sis_research/article/9759/viewcontent/29884_Article_Text_33938_1_2_20240324_pvoa.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Description
Summary:	Large Language Models (LLMs) have recently demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. They have also shown the ability to perform chain-of-thought (CoT) reasoning to solve complex problems. Recent studies have explored CoT reasoning in complex multimodal scenarios, such as the science question answering task, by fine-tuning multimodal models with high-quality human-annotated CoT rationales. However, collecting high-quality COT rationales is usually time-consuming and costly. Besides, the annotated rationales are hardly accurate due to the external essential information missed. To address these issues, we propose a novel method termed T-SciQ that aims at teaching science question answering with LLM signals. The T-SciQ approach generates high-quality CoT rationales as teaching signals and is advanced to train much smaller models to perform CoT reasoning in complex modalities. Additionally, we introduce a novel data mixing strategy to produce more effective teaching data samples for simple and complex science question answer problems. Extensive experimental results show that our T-SciQ method achieves a new state-of-the-art performance on the ScienceQA benchmark, with an accuracy of 96.18%. Moreover, our approach outperforms the most powerful fine-tuned baseline by 4.5%. The code is publicly available at https://github.com/T-SciQ/T-SciQ.

T-SciQ: Teaching multimodal Chain-of-Thought reasoning via large language model signals for science question answering

Similar Items