Contextual object detection with multimodal large language models

Recent Multimodal Large Language Models (MLLMs) are remarkable in vision-language tasks, such as image captioning and question answering, but lack the essential perception ability, i.e., object detection. In this work, we address this limitation by introducing a novel research problem of contextual...

Full description

Saved in:
Bibliographic Details
Main Authors: Zang, Yuhang, Li, Wei, Han, Jun, Zhou, Kaiyang, Loy, Chen Change
Other Authors: College of Computing and Data Science
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181063
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Be the first to leave a comment!
You must be logged in first