Unified generative and discriminative training for multi-modal Large Language Models
In recent times, Vision-Language Models (VLMs) have been trained under two predominant paradigms. Generative training has enabled Multimodal Large Language Models (MLLMs) to tackle various complex tasks, yet issues such as hallucinations and weak object discrimination persist. Discriminative trainin...
Saved in:
Main Authors: | CHOW, Wei, LI, Juncheng, PAN, Kaihang, YU, Qifan, FEI, Hao, GE, Zhiqi, YANG, Shuai, TENG, Siliang, ZHANG, Hanwang, Qianru SUN |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9743 https://ink.library.smu.edu.sg/context/sis_research/article/10743/viewcontent/NeurIPS_2024_Sugar.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Similar Items
-
Towards unified multimodal editing with enhanced knowledge collaboration
by: PAN, Kaihang, et al.
Published: (2024) -
SELF-SUPERVISED MODELING FOR MULTI-MODAL UNDERSTANDING
by: YUE XIANGHU
Published: (2024) -
Multi modal video analysis with LLM for descriptive emotion and expression annotation
by: Fan, Yupei
Published: (2024) -
Genixer : Empowering multimodal Large Language Models as a powerful data generator
by: ZHAO, Henry Hengyuan, et al.
Published: (2024) -
Pro-Cap: Leveraging a frozen vision-language model for hateful meme detection
by: CAO, Rui, et al.
Published: (2023)