Q-instruct: improving low-level visual abilities for multi-modality foundation models

Q-instruct: improving low-level visual abilities for multi-modality foundation models

Multi-modality foundation models, as represented by GPT-4V, have brought a new paradigm for low-level visual perception and understanding tasks, that can respond to a broad range of natural human instructions in a model. While existing foundation models have shown exciting potentials on low-level...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wu, Haoning, Zhang, Zicheng, Zhang, Erli, Chen, Chaofeng, Liao, Liang, Wang, Annan, Xu, Kaixin, Li, Chunyi, Hou, Jingwen, Zhai, Guangtao, Xue, Geng, Sun, Wenxiu, Yan, Qiong, Lin, Weisi
Other Authors:	College of Computing and Data Science
Format:	Conference or Workshop Item
Language:	English
Published:	2024
Subjects:	Computer and Information Science Multi-modality large language models Computer vision
Online Access:	https://hdl.handle.net/10356/178464 http://arxiv.org/abs/2311.06783v1 https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Q-Instruct_Improving_Low-level_Visual_Abilities_for_Multi-modality_Foundation_Models_CVPR_2024_paper.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Similar Items

Q-bench: a benchmark for general-purpose foundation models on low-level vision
by: Wu, Haoning, et al.
Published: (2024)

Q-align: teaching LMMs for visual scoring via discrete text-defined levels
by: Wu, Haoning, et al.
Published: (2024)

Exploring video quality assessment on user generated contents from aesthetic and technical perspectives
by: Wu, Haoning, et al.
Published: (2024)

FAST-VQA: efficient end-to-end video quality assessment with fragment sampling
by: Wu, Haoning, et al.
Published: (2024)

Neighbourhood representative sampling for efficient end-to-end video quality assessment
by: Wu, Haoning, et al.
Published: (2024)

Evaluation of modal stress resultants in freely vibrating plates
by: Wang, C.M., et al.
Published: (2014)

Exploring the effectiveness of video perceptual representation in blind video quality assessment
by: Liao, Liang, et al.
Published: (2024)

Collaborative cross-modal fusion with Large Language Model for recommendation
by: LIU, Zhongzhou, et al.
Published: (2024)

Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection
by: Gao, Wei, et al.
Published: (2021)

汉语情态助动词的主观性和主观化 = THE SUBJECTIVITY AND SUBJECTIFICATION OF MODAL AUXILIARIES IN CHINESE
by: 杨黎黎, et al.
Published: (2015)

Blind video quality prediction by uncovering human video perceptual representation
by: Liao, Liang, et al.
Published: (2024)

AimigoTutor - tutoring application using multi-modal capabilities
by: Nguyen, Viet Hoang
Published: (2024)

Cross-modal recipe retrieval with stacked attention model
by: CHEN, Jing-Jing, et al.
Published: (2018)

A psychovisual quality metric in free-energy principle
by: Lin, Weisi, et al.
Published: (2013)

Epistemic modality in TED talks on education
by: Ton Nu, My Nhat, et al.
Published: (2019)

Unifying text, tables, and images for multimodal question answering
by: LUO, Haohao, et al.
Published: (2023)

Alleviating the inconsistency of multimodal data in cross-modal retrieval
by: Li, Tieying, et al.
Published: (2024)

Integrated framework for developing instructional videos for foundational computing courses
by: SHIM, Kyong Jin, et al.
Published: (2021)

Cross-modal recipe retrieval: How to cook this dish?
by: CHEN, Jingjing, et al.
Published: (2017)

Learning language to symbol and language to vision mapping for visual grounding
by: He, Su, et al.
Published: (2022)

Temporal sentence grounding in videos: a survey and future directions
by: Zhang, Hao, et al.
Published: (2023)

Modalities and Multimodalities
by: Carnielli, Walter, et al.
Published: (2017)

A characterisation of open bisimilarity using an intuitionistic modal logic
by: Ahrn, Ki Yung, et al.
Published: (2018)

Fusing heterogeneous modalities for video and image re-ranking
by: TAN, Hung-Khoon, et al.
Published: (2011)

A stretchable and transparent electrode based on PEGylated silk fibroin for in vivo dual-modal neural-vascular activity probing
by: Cui, Yajing, et al.
Published: (2022)

The verb in Philippine English: A preliminary analysis of modal would
by: Bautista, Ma. Lourdes S.
Published: (2004)

QuantfolioX: portfolio management application using large language model technology
by: Teo, Charlotte Xuan Qin
Published: (2024)

Vision-language-model-based video quality assessment
by: Zhang, Erli
Published: (2024)

FHENet: lightweight feature hierarchical exploration network for real-time rail surface defect inspection in RGB-D images
by: Zhou, Wujie, et al.
Published: (2023)

Online multi-face tracking with multi-modality cascaded matching
by: Weng, Zhenyu, et al.
Published: (2024)

Inference acceleration of large language models
by: Zhang, Boyu
Published: (2024)

An empirical study on adaptation methods for large-scale vision-language models
by: Wang, Annan
Published: (2023)

Can online reviews reveal a product's true quality? Empirical findings analytical modeling of online word-of-mouth communication
by: HU, Nan, et al.
Published: (2006)

Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
by: Bai, Lubin, et al.
Published: (2023)

Benchmark hydroelastic responses of a circular VLFS under wave action
by: Watanabe, E., et al.
Published: (2014)

The development of online computer aided instruction (CAI) in HTML, CSS and Javascript for 4th year highschool of Christ the King College of Cavite foundation.
by: Moje, Vhil Laurence A., et al.
Published: (2011)

Learning a cross-modal hashing network for multimedia search
by: Tan, Yap Peng, et al.
Published: (2018)

Microfluidics-based microbubbles in methylene blue solution for photoacoustic and ultrasound imaging
by: Das, Dhiman, et al.
Published: (2018)

Object-level attention for aesthetic rating distribution prediction
by: Hou, Jingwen, et al.
Published: (2020)

Evolutionary computation in power systems
by: Miranda, V., et al.
Published: (2014)