Evaluating vision-language models long-chain reasoning ability with multiple ground truths

With the recent advancements in vision-language models, many researchers start to evaluate their various zero-shot capabilities to answer questions given a video input. However, there has not been a standardised and “best practice” method to evaluate the quality of a model’s open-ended answer given...

Full description

Saved in:
Bibliographic Details
Main Author: Setiadharma, Christopher Arif
Other Authors: Liu Ziwei
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175186
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Be the first to leave a comment!
You must be logged in first