UniD3: unified discrete diffusion for simultaneous vision-language generation

UniD3: unified discrete diffusion for simultaneous vision-language generation

The recently developed discrete diffusion model performs extraordinarily well in generation tasks, especially in the text-to-image task, showing great potential for modeling multimodal signals. In this paper, we leverage these properties and present a unified multimodal generation model, which can p...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hu, Minghui, Zheng, Chuanxia, Cham, Tat-Jen, Suganthan, Ponnuthurai Nagaratnam, Yang, Zuopeng, Zheng, Heliang, Wang, Chaoyue, Tao, Dacheng
Other Authors:	School of Computer Science and Engineering
Format:	Conference or Workshop Item
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Diffusion Computer Graphics
Online Access:	https://hdl.handle.net/10356/172665 https://openreview.net/forum?id=8JqINxA-2a
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Similar Items

Global context with discrete diffusion in vector quantised modelling for image generation
by: Hu, Minghui, et al.
Published: (2023)

Cocktail: mixing multi-modality controls for text-conditional image generation
by: Hu, Minghui, et al.
Published: (2023)

Pluralistic image completion
by: Zheng, Chuanxia, et al.
Published: (2020)

Pluralistic free-form image completion
by: Zheng, Chuanxia, et al.
Published: (2023)

Experimental evaluation of stochastic configuration networks: is SC algorithm inferior to hyper-parameter optimization method?
by: Hu, Minghui, et al.
Published: (2022)

The spatially-correlative loss for various image translation tasks
by: Zheng, Chuanxia, et al.
Published: (2021)

Bridging global context interactions for high-fidelity image completion
by: Zheng, Chuanxia, et al.
Published: (2023)

Sem2NeRF: converting single-view semantic masks to neural radiance fields
by: Chen, Yuedong, et al.
Published: (2023)

A unified 3D human motion synthesis model via conditional variational auto-encoder
by: Cai, Yujun, et al.
Published: (2023)

Visiting the Invisible: layer-by-layer completed scene decomposition
by: Zheng, Chuanxia, et al.
Published: (2023)

T2Net : synthetic-to-realistic translation for solving single-image depth estimation tasks
by: Zheng, Chuanxia, et al.
Published: (2020)

Discrete geodesic graphs
by: Fang, Zheng
Published: (2019)

AgileGAN: stylizing portraits by inversion-consistent transfer learning
by: Song, Guoxian, et al.
Published: (2023)

Shading‐based surface recovery using subdivision‐based representation
by: Deng, Teng, et al.
Published: (2020)

ClusteringSDF: self-organized neural implicit surfaces for 3D decomposition
by: Wu, Tianhao, et al.
Published: (2024)

Real-time shadow-aware portrait relighting in virtual backgrounds for realistic telepresence
by: Song, Guoxian, et al.
Published: (2023)

Synthesizing photorealistic images with deep generative learning
by: Zheng, Chuanxia
Published: (2021)

Interactive display walls based on camera-projector system
by: Cham, Tat Jen
Published: (2008)

Half-body portrait relighting with overcomplete lighting representation
by: Song, Guoxian, et al.
Published: (2023)

Real-parameter unconstrained optimization based on enhanced fitness-adaptive differential evolution algorithm with novel mutation
by: Mohamed, Ali Wagdy, et al.
Published: (2020)

Class-incremental learning on multivariate time series via shape-aligned temporal distillation
by: Qiao, Zhongzheng, et al.
Published: (2023)

Coherent visual story generation using diffusion models
by: Jiang, Jiaxi
Published: (2024)

Multiple consumer-grade depth camera registration using everyday objects
by: Deng, Teng, et al.
Published: (2020)

Entry-flipped transformer for inference and prediction of participant behavior
by: Hu, Bo, et al.
Published: (2023)

Recovering facial reflectance and geometry from multi-view images
by: Song, Guoxian, et al.
Published: (2023)

A unified framework for examining program correctness
by: Alcabasa, Lance, et al.
Published: (2013)

Software-based unified security switch
by: Cagampan, Dennis H., et al.
Published: (2009)

Least squares KNN-based weighted multiclass twin SVM
by: Tanveer, M., et al.
Published: (2022)

3iGS: factorised tensorial illumination for 3D Gaussian splatting
by: Tang, Zhe Jun, et al.
Published: (2025)

A unified architecture for flat CORDIC
by: Bimal Gisuthan.
Published: (2008)

From qualitative data to correlation using deep generative networks: demonstrating the relation of nuclear position with the arrangement of actin filaments
by: Vasudevan, Jyothsna, et al.
Published: (2023)

Towards software cognitive complexity measure with granular structures of unified factors
by: Benjapol Auprasert
Published: (2012)

Customized image synthesis using diffusion models
by: Fu, Guanqiao
Published: (2024)

MPT-Net: mask point transformer network for large scale point cloud semantic segmentation
by: Tang, Zhe Jun, et al.
Published: (2023)

Unified Banking Profess Framework (UBPF) Wiki
by: Ramesh Swaroop, Maitreyi, et al.
Published: (2014)

Discretized-Vapnik-Chervonenkis dimension for analyzing complexity of real function classes
by: Zhang, Chao, et al.
Published: (2013)

From noise to information: discriminative tasks based on randomized neural networks and generative tasks based on diffusion models
by: Hu, Minghui
Published: (2024)

Efficient and practical algorithms for discrete geodesics
by: Xiang, Ying
Published: (2013)

ABLE-NeRF: attention-based rendering with learnable embeddings for neural radiance field
by: Tang, Zhe Jun, et al.
Published: (2023)

Discrete differential geometry driven methods for architectural geometry
by: Yao, Sidan
Published: (2022)