Q-bench: a benchmark for general-purpose foundation models on low-level vision

Q-bench: a benchmark for general-purpose foundation models on low-level vision

The rapid evolution of Multi-modality Large Language Models (MLLMs) has catalyzed a shift in computer vision from specialized models to general-purpose foundation models. Nevertheless, there is still an inadequacy in assessing the abilities of MLLMs on low-level visual perception and understandin...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wu, Haoning, Zhang, Zicheng, Zhang, Erli, Chen, Chaofeng, Liao, Liang, Wang, Annan, Li, Chunyi, Sun, Wenxiu, Yan, Qiong, Zhai, Guangtao, Lin, Weisi
Other Authors:	College of Computing and Data Science
Format:	Conference or Workshop Item
Language:	English
Published:	2024
Subjects:	Computer and Information Science Multi-modality large language models Computer vision
Online Access:	https://hdl.handle.net/10356/178462 http://arxiv.org/abs/2309.14181v3 https://openreview.net/forum?id=0V5TVt9bk0 https://iclr.cc/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Similar Items

Q-instruct: improving low-level visual abilities for multi-modality foundation models
by: Wu, Haoning, et al.
Published: (2024)

Q-align: teaching LMMs for visual scoring via discrete text-defined levels
by: Wu, Haoning, et al.
Published: (2024)

Exploring video quality assessment on user generated contents from aesthetic and technical perspectives
by: Wu, Haoning, et al.
Published: (2024)

FAST-VQA: efficient end-to-end video quality assessment with fragment sampling
by: Wu, Haoning, et al.
Published: (2024)

Benchmark hydroelastic responses of a circular VLFS under wave action
by: Watanabe, E., et al.
Published: (2014)

Evaluation of modal stress resultants in freely vibrating plates
by: Wang, C.M., et al.
Published: (2014)

Exploring the effectiveness of video perceptual representation in blind video quality assessment
by: Liao, Liang, et al.
Published: (2024)

Neighbourhood representative sampling for efficient end-to-end video quality assessment
by: Wu, Haoning, et al.
Published: (2024)

Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection
by: Gao, Wei, et al.
Published: (2021)

汉语情态助动词的主观性和主观化 = THE SUBJECTIVITY AND SUBJECTIFICATION OF MODAL AUXILIARIES IN CHINESE
by: 杨黎黎, et al.
Published: (2015)

On very large scale test collection for landmark image search benchmarking
by: CHENG, Zhiyong, et al.
Published: (2016)

AimigoTutor - tutoring application using multi-modal capabilities
by: Nguyen, Viet Hoang
Published: (2024)

Cross-modal recipe retrieval with stacked attention model
by: CHEN, Jing-Jing, et al.
Published: (2018)

A psychovisual quality metric in free-energy principle
by: Lin, Weisi, et al.
Published: (2013)

Epistemic modality in TED talks on education
by: Ton Nu, My Nhat, et al.
Published: (2019)

BigCloneBench Considered Harmful for Machine Learning
by: Krinke J.
Published: (2023)

Unifying text, tables, and images for multimodal question answering
by: LUO, Haohao, et al.
Published: (2023)

Multifunction integrated electronics test bench
by: Aguas, Giovanni Alvin U., et al.
Published: (1996)

Alleviating the inconsistency of multimodal data in cross-modal retrieval
by: Li, Tieying, et al.
Published: (2024)

Benchmarking foundation models with language-model-as-an-examiner
by: BAI, Yushi, et al.
Published: (2023)

A large scale Linux-Kernel based benchmark for feature location research
by: Xing, Z., et al.
Published: (2014)

Benchmarking Multimedia Databases
by: NARASIMHALU, Arcot Desai, et al.
Published: (1997)

Cross-modal recipe retrieval: How to cook this dish?
by: CHEN, Jingjing, et al.
Published: (2017)

Learning language to symbol and language to vision mapping for visual grounding
by: He, Su, et al.
Published: (2022)

Temporal sentence grounding in videos: a survey and future directions
by: Zhang, Hao, et al.
Published: (2023)

Modalities and Multimodalities
by: Carnielli, Walter, et al.
Published: (2017)

A characterisation of open bisimilarity using an intuitionistic modal logic
by: Ahrn, Ki Yung, et al.
Published: (2018)

Fusing heterogeneous modalities for video and image re-ranking
by: TAN, Hung-Khoon, et al.
Published: (2011)

QuantfolioX: portfolio management application using large language model technology
by: Teo, Charlotte Xuan Qin
Published: (2024)

Continuous benchmarking of serverless cloud providers 2
by: Min Kabar Kyaw
Published: (2024)

The verb in Philippine English: A preliminary analysis of modal would
by: Bautista, Ma. Lourdes S.
Published: (2004)

Vision-language-model-based video quality assessment
by: Zhang, Erli
Published: (2024)

Continuous benchmarking of serverless cloud providers
by: Wong, Yi Pun
Published: (2024)

FHENet: lightweight feature hierarchical exploration network for real-time rail surface defect inspection in RGB-D images
by: Zhou, Wujie, et al.
Published: (2023)

Online multi-face tracking with multi-modality cascaded matching
by: Weng, Zhenyu, et al.
Published: (2024)

Visual attention model analysis and benchmarking
by: Tan, Weisheng.
Published: (2011)

Perceptual image processing algorithm benchmarking
by: Nur Shabrina Rusli.
Published: (2012)

A thorough benchmark and a new model for light field saliency detection
by: Gao, Wei, et al.
Published: (2023)

International Assessment Benchmarks: Inputs to Enhance the K to 12 Assessment Policies
by: Lapinid, Minie Rose C., et al.
Published: (2024)

An empirical study on adaptation methods for large-scale vision-language models
by: Wang, Annan
Published: (2023)