Large multimodal models for visual reasoning

This paper introduces a novel framework for enhancing visual spatial reasoning by leveraging the strengths of Large Language Models (LLMs) and Vision-Language Models (VLMs). We propose two complementary methods: LLMGuide and LLMVerify. LLMGuide uses the LLM to generate detailed step-by-step instruct...

全面介紹

Saved in:

書目詳細資料
主要作者:	Duong, Ngoc Yen
其他作者:	Luu Anh Tuan
格式:	Final Year Project
語言:	English
出版:	Nanyang Technological University 2024
主題:	Computer and Information Science Natural language processing Multimodal learning
在線閱讀:	https://hdl.handle.net/10356/181503
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

因特網

https://hdl.handle.net/10356/181503

Large multimodal models for visual reasoning

因特網

相似書籍