Large multimodal models for visual reasoning

This paper introduces a novel framework for enhancing visual spatial reasoning by leveraging the strengths of Large Language Models (LLMs) and Vision-Language Models (VLMs). We propose two complementary methods: LLMGuide and LLMVerify. LLMGuide uses the LLM to generate detailed step-by-step instruct...

全面介紹

Saved in:
書目詳細資料
主要作者: Duong, Ngoc Yen
其他作者: Luu Anh Tuan
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2024
主題:
在線閱讀:https://hdl.handle.net/10356/181503
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!