Exploring pre-trained diffusion models in a tuning-free manner

Diffusion models, which utilize a multi-step denoising sampling procedure and leverage extensive image-text pair datasets for training, have emerged as an innovative option among deep generative models. These models exhibit superior performance across various applications, including image synthes...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Wang, Jinghao
مؤلفون آخرون: Liu Ziwei
التنسيق: Thesis-Master by Research
اللغة:English
منشور في: Nanyang Technological University 2025
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/181937
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Nanyang Technological University
اللغة: English
الوصف
الملخص:Diffusion models, which utilize a multi-step denoising sampling procedure and leverage extensive image-text pair datasets for training, have emerged as an innovative option among deep generative models. These models exhibit superior performance across various applications, including image synthesis and video generation. In this thesis, we further explore applications of pre-trained diffusion models other than text-to-image generation applications in a tuning-free manner. In Chapter 1, we discuss image morphing between two real images via diffusion models. Our approach, FreeMorph, is based on key insights regarding attention interpolation and layout similarity in latent noise, which are critical for enhancing morphing quality. In Chapter 2, we discuss attention interpolation in diffusion models. This work introduces a novel training-free technique named Attention Interpolation via Diffusion (AID). AID has two key contributions: 1) a fused inner/outer interpolated attention layer to boost image consistency and fidelity; and 2) selection of interpolation coefficients via a beta distribution to increase smoothness.