Contextual object detection with multimodal large language models
Recent Multimodal Large Language Models (MLLMs) are remarkable in vision-language tasks, such as image captioning and question answering, but lack the essential perception ability, i.e., object detection. In this work, we address this limitation by introducing a novel research problem of contextual...
Saved in:
Main Authors: | Zang, Yuhang, Li, Wei, Han, Jun, Zhou, Kaiyang, Loy, Chen Change |
---|---|
Other Authors: | College of Computing and Data Science |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181063 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
Semi-supervised and long-tailed object detection with Cascadematch
by: Zang, Yuhang, et al.
Published: (2023) -
Information-theoretic analysis of input strokes in visual object cutout
by: Mu, Y., et al.
Published: (2014) -
Car cabin object detection using artificial intelligence (multimodal object detection)
by: Li, Ying
Published: (2024) -
Hierarchical object groups for scene classification
by: Sadovnik A., et al.
Published: (2018) -
Efficient salient region detection with soft image abstraction
by: CHENG, Ming-Ming, et al.
Published: (2013)