Large multimodal models for visual reasoning
This paper introduces a novel framework for enhancing visual spatial reasoning by leveraging the strengths of Large Language Models (LLMs) and Vision-Language Models (VLMs). We propose two complementary methods: LLMGuide and LLMVerify. LLMGuide uses the LLM to generate detailed step-by-step instruct...
Saved in:
Main Author: | Duong, Ngoc Yen |
---|---|
Other Authors: | Luu Anh Tuan |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181503 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
T-SciQ: Teaching multimodal Chain-of-Thought reasoning via large language model signals for science question answering
by: WANG, Lei, et al.
Published: (2024) -
LOVA3 : Learning to visual question answering, asking and assessment
by: ZHAO, Henry Hengyuan, et al.
Published: (2024) -
Multimodal few-shot classification without attribute embedding
by: Chang, Jun Qing, et al.
Published: (2024) -
A multimodal approach to automatic personality recognition on Filipino social media data
by: Secuya, Alfonso C.
Published: (2021) -
M2Lens: Visualizing and explaining multimodal models for sentiment analysis
by: WANG, Xingbo, et al.
Published: (2022)