Q-align: teaching LMMs for visual scoring via discrete text-defined levels
The explosion of visual content available online underscores the requirement for an accurate machine assessor to robustly evaluate scores across diverse types of visual contents. While recent studies have demonstrated the exceptional potentials of large multi-modality models (LMMs) on a wide rang...
Saved in:
Main Authors: | Wu, Haoning, Zhang, Zicheng, Zhang, Weixia, Chen, Chaofeng, Liao, Liang, Li, Chunyi, Gao, Yixuan, Wang, Annan, Zhang, Erli, Sun, Wenxiu, Yan, Qiong, Min, Xiongkuo, Zhai, Guangtao, Lin, Weisi |
---|---|
Other Authors: | College of Computing and Data Science |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/178466 http://arxiv.org/abs/2312.17090v1 https://openreview.net/forum?id=PHjkVjR78A https://icml.cc/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
Q-bench: a benchmark for general-purpose foundation models on low-level vision
by: Wu, Haoning, et al.
Published: (2024) -
Q-instruct: improving low-level visual abilities for multi-modality foundation models
by: Wu, Haoning, et al.
Published: (2024) -
Exploring video quality assessment on user generated contents from aesthetic and technical perspectives
by: Wu, Haoning, et al.
Published: (2024) -
FAST-VQA: efficient end-to-end video quality assessment with fragment sampling
by: Wu, Haoning, et al.
Published: (2024) -
Evaluation of modal stress resultants in freely vibrating plates
by: Wang, C.M., et al.
Published: (2014)