Cocktail: mixing multi-modality controls for text-conditional image generation
Text-conditional diffusion models are able to generate high-fidelity images with diverse contents. However, linguistic representations frequently exhibit ambiguous descriptions of the envisioned objective imagery, requiring the incorporation of additional control signals to bolster the efficacy of t...
Saved in:
Main Authors: | Hu, Minghui, Zheng, Jianbin, Liu, Daqing, Zheng, Chuanxia, Wang, Chaoyue, Tao, Dacheng, Cham, Tat-Jen |
---|---|
Other Authors: | School of Computer Science and Engineering |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172668 https://nips.cc/virtual/2023/calendar |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
UniD3: unified discrete diffusion for simultaneous vision-language generation
by: Hu, Minghui, et al.
Published: (2023) -
Pluralistic free-form image completion
by: Zheng, Chuanxia, et al.
Published: (2023) -
Pluralistic image completion
by: Zheng, Chuanxia, et al.
Published: (2020) -
Sem2NeRF: converting single-view semantic masks to neural radiance fields
by: Chen, Yuedong, et al.
Published: (2023) -
Text extraction from name card images
by: LIN LIN
Published: (2010)