Masked diffusion transformer is a strong image synthesizer

Masked diffusion transformer is a strong image synthesizer

Despite its success in image synthesis, we observe that diffusion probabilistic models (DPMs) often lack contextual reasoning ability to learn the relations among object parts in an image, leading to a slow learning process. To solve this issue, we propose a Masked Diffusion Transformer (MDT) that i...

Full description

Saved in:

Bibliographic Details
Main Authors:	GAO, Shanghua, ZHOU, Pan, CHENG, Ming-Ming, YAN, Shuicheng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Training Representation learning Image synthesis Computational modeling Synthesizers Source coding Semantics Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/9024 https://ink.library.smu.edu.sg/context/sis_research/article/10027/viewcontent/2023_ICCV_MDT.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Towards understanding why mask reconstruction pretraining helps in downstream tasks
by: PAN, Jiachun, et al.
Published: (2023)

EditAnything: Empowering unparalleled flexibility in image editing and generation
by: GAO, Shanghua, et al.
Published: (2023)

Computer-driven synthesizer (musical instrument digital interface)
by: Encarnacion, Memory Joy R., et al.
Published: (1994)

Multimedia event detection: Strong by integration
by: ZHANG, Hao, et al.
Published: (2015)

Few-shot learner parameterization by diffusion time-steps
by: YUE, Zhongqi, et al.
Published: (2024)

Representations of keypoint-based semantic concept detection: A comprehensive study
by: JIANG, Yu-Gang, et al.
Published: (2010)

Fast semantic diffusion for large-scale context-based image and video annotation
by: JIANG, Yu-Gang, et al.
Published: (2012)

Diffusion time-step curriculum for one image to 3D generation
by: YI, Xuanyu, et al.
Published: (2024)

InceptionNeXt: When Inception meets ConvNeXt
by: YU, Weihao, et al.
Published: (2024)

LPT: Long-tailed prompt tuning for image classification
by: DONG, Bowen, et al.
Published: (2023)

Exploring diffusion time-steps for unsupervised representation learning
by: YUE, Zhongqi, et al.
Published: (2024)

Efficient meta learning via minibatch proximal update
by: ZHOU, Pan, et al.
Published: (2019)

Unsupervised modality adaptation with text-to-Image diffusion models for semantic segmentation
by: XIA, Ruihao, et al.
Published: (2024)

Self-promoted supervision for few-shot transformer
by: DONG, Bowen, et al.
Published: (2022)

High-resolution face swapping via latent semantics disentanglement
by: XU, Yangyang, et al.
Published: (2022)

On the pooling of positive examples with ontology for visual concept learning
by: ZHU, Shiai, et al.
Published: (2011)

Video graph transformer for video question answering
by: XIAO, Junbin, et al.
Published: (2022)

Let’s think outside the box: Exploring leap-of-thought in large language models with multimodal humor generation
by: ZHONG, Shanshan, et al.
Published: (2024)

Pro-Cap: Leveraging a frozen vision-language model for hateful meme detection
by: CAO, Rui, et al.
Published: (2023)

MANDO-HGT: Heterogeneous graph transformers for smart contract vulnerability detection
by: NGUYEN, Huu Hoang, et al.
Published: (2023)

Beyond textual constraints : Learning novel diffusion conditions with fewer examples
by: YU, Yuyang, et al.
Published: (2024)

Motion-based approach for BBC rushes structuring and characterization
by: NGO, Chong-wah, et al.
Published: (2005)

A theory-driven self-labeling refinement method for contrastive representation learning
by: ZHOU, Pan, et al.
Published: (2021)

Faster first-order methods for stochastic non-convex optimization on Riemannian manifolds
by: ZHOU, Pan, et al.
Published: (2019)

Selection of concept detectors for video search by ontology-enriched semantic spaces
by: WEI, Xiao-Yong, et al.
Published: (2008)

Feature prediction diffusion model for video anomaly detection
by: YAN, Cheng, et al.
Published: (2023)

Creation of content
by: Alvaro, Ma. Veronica Francesca A.
Published: (2024)

Rethinking multi-view representation learning via distilled disentangling
by: KE, Guanzhou, et al.
Published: (2024)

VireoJD-MM @ TRECVID 2019: Activities in extended video (ACTEV)
by: HOU, Zhijian, et al.
Published: (2019)

Prototypical contrastive learning of unsupervised representations
by: LI, Junnan, et al.
Published: (2021)

Consistent3D: Towards consistent high-fidelity text-to-3D generation with deterministic sampling prior
by: WU, Zike, et al.
Published: (2024)

Position-guided text prompt for vision-language pre-training
by: WANG, Alex Jinpeng, et al.
Published: (2023)

Hierarchical semantic-aware neural code representation
by: JIANG, Yuan, et al.
Published: (2022)

Improving GAN training with probability ratio clipping and sample reweighting
by: WU, Yue, et al.
Published: (2020)

Task relation networks
by: LI, Jianshu, et al.
Published: (2019)

Deciphering On-Off signalling network of Streptomyces secondary metabolism
by: Takuya, Nihira
Published: (2011)

Synthesis, structure, Hirshfeld surface analysis and catecholase activity of Ni(II) complex with sterically constrained phenol based ligand
by: Mandal, Bikramaditya, et al.
Published: (2021)

LargeEA: Aligning entities for large-scale knowledge graphs
by: GE, Congcong, et al.
Published: (2022)

RIS check hostname source code
by: Inamarga, Harold N.
Published: (2006)

Shutdown in gates source code
by: Inamarga, Harold N.
Published: (2006)