DRAGGAN-3D: Interactive point-dragging manipulation of 3D GANs

Recent advancements in 3D generative models have enabled the creation of high-quality 3D content, but intuitive user manipulation of these generated models remains a significant challenge. This project introduces DragGAN-3D, an innovative framework that extends the point-based manipulation capabilit...

全面介紹

Saved in:
書目詳細資料
主要作者: Wang, Haoxuan
其他作者: Xingang Pan
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2024
主題:
在線閱讀:https://hdl.handle.net/10356/181164
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Recent advancements in 3D generative models have enabled the creation of high-quality 3D content, but intuitive user manipulation of these generated models remains a significant challenge. This project introduces DragGAN-3D, an innovative framework that extends the point-based manipulation capabilities of DragGAN to the 3D domain, allowing users to intuitively edit 3D models by dragging control points in space. Building upon the tri-plane architecture of EG3D, our method enables precise, geometry-consistent modifications while maintaining multi-view consistency and high-fidelity 3D representations. The framework employs an iterative optimization process consisting of motion supervision and point tracking steps, with a dynamic discrete masking technique to control the scope of edits. We demonstrate the effectiveness of our approach through experiments on both randomly generated models and real-world images inverted into the latent space. Results show that DragGAN-3D successfully enables various edits, from subtle facial feature adjustments to geometric modifications, while preserving the overall quality and consistency of the 3D model. Our method is compatible with existing GAN inversion techniques, allowing the manipulation of real-world images, and proves robust across different datasets and pre-trained models. By bridging the gap between user intent and 3D model manipulation, DragGAN-3D represents a step forward in making 3D generative models more accessible and user-friendly for creative applications.