UniD3: unified discrete diffusion for simultaneous vision-language generation
The recently developed discrete diffusion model performs extraordinarily well in generation tasks, especially in the text-to-image task, showing great potential for modeling multimodal signals. In this paper, we leverage these properties and present a unified multimodal generation model, which can p...
Saved in:
Main Authors: | Hu, Minghui, Zheng, Chuanxia, Cham, Tat-Jen, Suganthan, Ponnuthurai Nagaratnam, Yang, Zuopeng, Zheng, Heliang, Wang, Chaoyue, Tao, Dacheng |
---|---|
Other Authors: | School of Computer Science and Engineering |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172665 https://openreview.net/forum?id=8JqINxA-2a |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
Global context with discrete diffusion in vector quantised modelling for image generation
by: Hu, Minghui, et al.
Published: (2023) -
Cocktail: mixing multi-modality controls for text-conditional image generation
by: Hu, Minghui, et al.
Published: (2023) -
Pluralistic image completion
by: Zheng, Chuanxia, et al.
Published: (2020) -
Pluralistic free-form image completion
by: Zheng, Chuanxia, et al.
Published: (2023) -
Experimental evaluation of stochastic configuration networks: is SC algorithm inferior to hyper-parameter optimization method?
by: Hu, Minghui, et al.
Published: (2022)