Evaluating vision-language models long-chain reasoning ability with multiple ground truths

With the recent advancements in vision-language models, many researchers start to evaluate their various zero-shot capabilities to answer questions given a video input. However, there has not been a standardised and “best practice” method to evaluate the quality of a model’s open-ended answer given...

全面介紹

Saved in:

書目詳細資料
主要作者:	Setiadharma, Christopher Arif
其他作者:	Liu Ziwei
格式:	Final Year Project
語言:	English
出版:	Nanyang Technological University 2024
主題:	Computer and Information Science
在線閱讀:	https://hdl.handle.net/10356/175186
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

因特網

https://hdl.handle.net/10356/175186

Evaluating vision-language models long-chain reasoning ability with multiple ground truths

因特網

相似書籍