Unsupervised learning with diffusion models

In computer vision, a key goal is to obtain visual representations that faithfully capture the underlying structure and semantics of the data, encompassing object identities, positions, textures, and lighting conditions. However, existing methods for un-/self-supervised learning (SSL) are restricted...

全面介紹

Saved in:
書目詳細資料
主要作者: Wang, Jiankun
其他作者: Weichen Liu
格式: Thesis-Master by Research
語言:English
出版: Nanyang Technological University 2023
主題:
在線閱讀:https://hdl.handle.net/10356/171953
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English
實物特徵
總結:In computer vision, a key goal is to obtain visual representations that faithfully capture the underlying structure and semantics of the data, encompassing object identities, positions, textures, and lighting conditions. However, existing methods for un-/self-supervised learning (SSL) are restricted to untangling basic augmentation attributes such as rotation and color modification, which constrains their capacity to efficiently modularize the underlying semantics. In the thesis, we propose DiffSiam, a novel SSL framework that incorporates a disentangled representation learning algorithm based on diffusion models. By introducing additional Gaussian noises during the diffusion forward process, DiffSiam collapses samples with similar attributes, intensifying the attribute loss. To compensate, we learn an expanding set of modular features over time, adhering to the reconstruction of the Diffusion Model. This training dynamics biases the learned features towards disentangling diverse semantics, from fine-grained to coarse-grained attributes. Experimental results demonstrate the superior performance of DiffSiam on various classification benchmarks and generative tasks, validating its effectiveness in generating disentangled representations.