Q-instruct: improving low-level visual abilities for multi-modality foundation models
Multi-modality foundation models, as represented by GPT-4V, have brought a new paradigm for low-level visual perception and understanding tasks, that can respond to a broad range of natural human instructions in a model. While existing foundation models have shown exciting potentials on low-level...
Saved in:
Main Authors: | Wu, Haoning, Zhang, Zicheng, Zhang, Erli, Chen, Chaofeng, Liao, Liang, Wang, Annan, Xu, Kaixin, Li, Chunyi, Hou, Jingwen, Zhai, Guangtao, Xue, Geng, Sun, Wenxiu, Yan, Qiong, Lin, Weisi |
---|---|
Other Authors: | College of Computing and Data Science |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/178464 http://arxiv.org/abs/2311.06783v1 https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Q-Instruct_Improving_Low-level_Visual_Abilities_for_Multi-modality_Foundation_Models_CVPR_2024_paper.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
Q-bench: a benchmark for general-purpose foundation models on low-level vision
by: Wu, Haoning, et al.
Published: (2024) -
Q-align: teaching LMMs for visual scoring via discrete text-defined levels
by: Wu, Haoning, et al.
Published: (2024) -
Exploring video quality assessment on user generated contents from aesthetic and technical perspectives
by: Wu, Haoning, et al.
Published: (2024) -
FAST-VQA: efficient end-to-end video quality assessment with fragment sampling
by: Wu, Haoning, et al.
Published: (2024) -
Neighbourhood representative sampling for efficient end-to-end video quality assessment
by: Wu, Haoning, et al.
Published: (2024)