Evaluating vision-language models long-chain reasoning ability with multiple ground truths

With the recent advancements in vision-language models, many researchers start to evaluate their various zero-shot capabilities to answer questions given a video input. However, there has not been a standardised and “best practice” method to evaluate the quality of a model’s open-ended answer given...

全面介紹

Saved in:
書目詳細資料
主要作者: Setiadharma, Christopher Arif
其他作者: Liu Ziwei
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2024
主題:
在線閱讀:https://hdl.handle.net/10356/175186
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English