Evaluating vision-language models long-chain reasoning ability with multiple ground truths

Evaluating vision-language models long-chain reasoning ability with multiple ground truths

With the recent advancements in vision-language models, many researchers start to evaluate their various zero-shot capabilities to answer questions given a video input. However, there has not been a standardised and “best practice” method to evaluate the quality of a model’s open-ended answer given...

Full description

Saved in:

Bibliographic Details
Main Author:	Setiadharma, Christopher Arif
Other Authors:	Liu Ziwei
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science
Online Access:	https://hdl.handle.net/10356/175186
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Similar Items

Towards Ground Truthing Observations in Gray-Box Anomaly Detection
by: MING, Jiang, et al.
Published: (2011)

ROME: Evaluating pre-trained vision-language models on reasoning beyond visual common sense
by: ZHOU, Kankan, et al.
Published: (2023)

Learning language to symbol and language to vision mapping for visual grounding
by: He, Su, et al.
Published: (2022)

Learning to compose and reason with language tree structures for visual grounding
by: Hong, Richang, et al.
Published: (2022)

Chain of preference optimization: Improving chain-of-thought reasoning in LLMs
by: ZHANG, Xuan, et al.
Published: (2024)

Is the ground truth really accurate? Dataset purification for automated program repair
by: YANG, Deheng, et al.
Published: (2021)

Grounding referring expression in computer vision
by: Yuen, Shaun Chien Wee
Published: (2024)

Commune-level poverty estimates and ground truthing
by: FUJII, Tomoki
Published: (2003)

Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models
by: WANG, Lei, et al.
Published: (2023)

Do independent directors tell the truth, the whole truth, and nothing but the truth when they resign?
by: BAR-HAVA, Keren, et al.
Published: (2018)

Are vision language models multimodal learners?
by: Lee, Gyeonggeon
Published: (2024)

VISUAL QUESTION ANSWERING REASONING SYNTHETIC DATA GENERATION USING LARGE VISION LANGUAGE MODEL
by: Amadeus Irawan, Patrick

Do Outside Directors Tell the Truth, the Whole Truth, and Nothing But the Truth When They Resign?
by: BAR-HAVA, Keren, et al.
Published: (2014)

Is the ability to speak multiple languages always a good thing?
by: Singapore Management University
Published: (2015)

Ground truth estimation using multi-modal data fusion
by: Preeti Mohan.
Published: (2010)

In search of the "ground truth" - validation of MRSI and prostate surface segmentation
by: Ma, Hongyun
Published: (2008)

Semi-automatic ground truth generation for chart image recognition
by: Yang, L., et al.
Published: (2013)

Gebiss: An ImageJ plugin for the specification of ground truth and the performance evaluation of 3d segmentation algorithms
by: Kriston-Vizi, J., et al.
Published: (2014)

Enhancing visual grounding in vision-language pre-training with position-guided text prompts
by: WANG, Alex Jinpeng, et al.
Published: (2024)

A STUDY OF ILLUMINANT ESTIMATION AND GROUND TRUTH COLORS FOR COLOR CONSTANCY
by: CHENG DONGLIANG
Published: (2016)

Generating ground truthed dataset of chart images: Automatic or semi-automatic?
by: Huang, W., et al.
Published: (2013)

Nonrigid registration of myocardial perfusion MRI using pseudo ground truth
by: Li, C., et al.
Published: (2014)

Pseudo ground truth based nonrigid registration of myocardial perfusion MRI
by: Li, C., et al.
Published: (2014)

T-SciQ: Teaching multimodal Chain-of-Thought reasoning via large language model signals for science question answering
by: WANG, Lei, et al.
Published: (2024)

Vision-language-model-based video quality assessment
by: Zhang, Erli
Published: (2024)

Multiple Perspective Reasoning
by: Tze-Yun LEONG,
Published: (1996)

An Introduction to Critical and Creative Thinking: Analyzing and Evaluating Ordinary Language Reasoning
by: MOONEY, T. Brian, et al.
Published: (2015)

Peer evaluation : incentivizing truthful reporting
by: Ong, Zheng Yao
Published: (2019)

GroundNLQ @ Ego4D natural language queries challenge 2023
by: HOU, Zhijian, et al.
Published: (2023)

Exploiting Reasoning Chains for Multi-hop Science Question Answering
by: XU, Weiwen, et al.
Published: (2021)

Aligning vision and language for image captioning using deep learning
by: Cai, Chen
Published: (2024)

Large language models as source planner for personalized knowledge-grounded dialogues
by: WANG, Hongru, et al.
Published: (2023)

Zero-shot object detection and referring expression comprehension using vision-language models
by: A Manicka, Praveen
Published: (2024)

Is multi-hop reasoning really explainable? Towards benchmarking reasoning interpretability
by: LV, Xin, et al.
Published: (2021)

EVALUATION OF PT AMNA POWERâS ABILITY TO FULFILL LONG TERM OBLIGATION
by: Alin Amna, Fathimah

Are Online Reviews Just Noise? The Truth, the Whole Truth, or Only the Partial Truth?
by: HU, Nan, et al.
Published: (2009)

Ultrasound-guided needle tracking with deep learning: a novel approach with photoacoustic ground truth
by: Hui, Xie, et al.
Published: (2024)

Maximum likelihood estimation of ground truth for air quality monitoring using vehicular sensor networks
by: Talampas, Marc Caesar R., et al.
Published: (2013)

Development of a semi-automated ground truth annotation system for intelligent transport system applications
by: Rivera, Maverick C.
Published: (2021)

A novel label aggregation with attenuated scores for ground-Truth identification of dataset annotation with crowdsourcing
by: Ratchainant Thammasudjarit, et al.
Published: (2018)