Large multimodal models for visual reasoning

This paper introduces a novel framework for enhancing visual spatial reasoning by leveraging the strengths of Large Language Models (LLMs) and Vision-Language Models (VLMs). We propose two complementary methods: LLMGuide and LLMVerify. LLMGuide uses the LLM to generate detailed step-by-step instruct...

Full description

Saved in:
Bibliographic Details
Main Author: Duong, Ngoc Yen
Other Authors: Luu Anh Tuan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181503
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English