Large multimodal models for visual reasoning

This paper introduces a novel framework for enhancing visual spatial reasoning by leveraging the strengths of Large Language Models (LLMs) and Vision-Language Models (VLMs). We propose two complementary methods: LLMGuide and LLMVerify. LLMGuide uses the LLM to generate detailed step-by-step instruct...

Full description

Saved in:

Bibliographic Details
Main Author:	Duong, Ngoc Yen
Other Authors:	Luu Anh Tuan
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Natural language processing Multimodal learning
Online Access:	https://hdl.handle.net/10356/181503
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Be the first to leave a comment!

Large multimodal models for visual reasoning

Similar Items