Evaluating vision-language models long-chain reasoning ability with multiple ground truths

With the recent advancements in vision-language models, many researchers start to evaluate their various zero-shot capabilities to answer questions given a video input. However, there has not been a standardised and “best practice” method to evaluate the quality of a model’s open-ended answer given...

Full description

Saved in:

Bibliographic Details
Main Author:	Setiadharma, Christopher Arif
Other Authors:	Liu Ziwei
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science
Online Access:	https://hdl.handle.net/10356/175186
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Internet

https://hdl.handle.net/10356/175186

Evaluating vision-language models long-chain reasoning ability with multiple ground truths

Internet

Similar Items