Evaluating vision-language models long-chain reasoning ability with multiple ground truths

With the recent advancements in vision-language models, many researchers start to evaluate their various zero-shot capabilities to answer questions given a video input. However, there has not been a standardised and “best practice” method to evaluate the quality of a model’s open-ended answer given...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Setiadharma, Christopher Arif
مؤلفون آخرون:	Liu Ziwei
التنسيق:	Final Year Project
اللغة:	English
منشور في:	Nanyang Technological University 2024
الموضوعات:	Computer and Information Science
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/175186
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

الانترنت

https://hdl.handle.net/10356/175186

Evaluating vision-language models long-chain reasoning ability with multiple ground truths

الانترنت

مواد مشابهة