Expanding the generality of neural fields

Neural fields have emerged as a groundbreaking approach to representing 3D shapes, garnering significant attention due to their compatibility with modern deep-learning techniques. Neural fields, which parameterize physical properties of scenes or objects across space and time, have achieved remarkab...

Full description

Saved in:
Bibliographic Details
Main Author: Lan, Yushi
Other Authors: Chen Change Loy
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182229
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-182229
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Computer vision
Computer graphics
Neural 3D fields
spellingShingle Computer and Information Science
Computer vision
Computer graphics
Neural 3D fields
Lan, Yushi
Expanding the generality of neural fields
description Neural fields have emerged as a groundbreaking approach to representing 3D shapes, garnering significant attention due to their compatibility with modern deep-learning techniques. Neural fields, which parameterize physical properties of scenes or objects across space and time, have achieved remarkable success in tasks such as 3D shape and image synthesis, animation, 3D reconstruction, and pose estimation. However, these promising results are predominantly achieved by overfitting on individual scenes or objects. That is, the research on generalizable neural fields has been largely overlooked. This limitation hinders neural fields’ potential applications in downstream tasks, e.g., single-view 3D reconstruction, 3D object generation, and editing. To address this issue, this thesis advances generalizable neural field methods, including generalizable neural field algorithms, which improve training methodologies, and generalizable neural field representations, which propose novel 3D representations for specific applications. These methods demonstrate significant potential for developing solutions that are effective, robust, and practical. For generalizable neural field algorithms, correspondence-level generalization is first explored. Unlike explicit shape representations, e.g., meshes, it remains an open problem to establish dense correspondences across Neural Radiance Fields (NeRFs) of the same category. This problem is particularly challenging due to the implicit nature of NeRFs and the lack of ground-truth correspondence annotations. This thesis shows that these challenges can be addressed by leveraging the rich semantics and structural priors encapsulated in pre-trained NeRF-based GANs. Specifically, three key innovations are introduced: 1) a dual deformation field guided by latent codes as global structural indicators, 2) a learning objective that uses generator features as geometry-aware local descriptors, and 3) a method for generating infinite object-specific NeRF samples. Experiments demonstrate that these innovations enable accurate, smooth, and robust 3D dense correspondences, facilitating downstream applications such as texture transfer. To bridge the gap to real-world scenarios, this thesis further explores object-level generalization for neural fields. Specifically, this thesis proposes E3DGE, a framework addressing the challenge of 3D GAN inversion—predicting a latent code from a single 2D image to faithfully recover 3D shapes and textures. The inherent ill-posed nature of the problem, coupled with the limited capacity of global latent codes, presents significant challenges. To overcome these challenges, this thesis introduces an efficient self-training scheme that does not rely on real-world 2D-3D pairs but instead utilizes proxy samples generated from a 3D GAN. Additionally, the proposed approach enhances the generation network with a local branch that incorporates pixel-aligned features to accurately reconstruct texture details. Furthermore, a novel pipeline for 3D view-consistent editing is introduced. The efficacy of the proposed method is validated on two representative 3D GANs, namely StyleSDF and EG3D. Extensive experiments demonstrate that the proposed approach consistently outperforms state-of-the-art inversion methods, delivering superior quality in both shape and texture reconstruction. For generalizable neural field representations, this thesis first investigates neural field representations for generalizable 3D avatar heads. A novel framework is presented for generating photorealistic 3D human heads and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach constructs an implicit representation of 3D human heads, anchored on a parametric face model. To enhance representational capabilities and encode spatial information, the semantically consistent head region is represented by a local tri-plane modulated by a 3D Gaussian. Additionally, these tri-planes are further parameterized in a 2D UV space via a 3DMM, enabling effective utilization of the diffusion model for 3D head avatar generation. The proposed method facilitates the creation of diverse and realistic 3D human heads with flexible global and fine-grained region-based editing over facial structures, appearance, and expressions. Extensive experiments demonstrate the effectiveness of the proposed method. Finally, this thesis focuses on designing neural field representations for general 3D objects and unifying the 3D diffusion pipeline within the latent diffusion paradigm. Specifically, a novel framework called LN3Diff is proposed to enable fast, high-quality, and generic conditional 3D generation. The proposed method employs a 3D-aware architecture and variational autoencoder (VAE) to encode the input image into a structured, compact, 3D latent space. The latent is decoded by a transformer-based decoder into a high-capacity 3D neural field. By training a diffusion model on this 3D-aware latent space, the proposed method achieves state-of the-art performance on ShapeNet for 3D generation. It also demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation across various datasets. Moreover, it surpasses existing 3D diffusion methods in inference speed, requiring no per-instance optimization. The proposed LN3Diff represents a significant advancement in 3D generative modeling and holds promise for diverse applications in 3D vision and graphics tasks.
author2 Chen Change Loy
author_facet Chen Change Loy
Lan, Yushi
format Thesis-Doctor of Philosophy
author Lan, Yushi
author_sort Lan, Yushi
title Expanding the generality of neural fields
title_short Expanding the generality of neural fields
title_full Expanding the generality of neural fields
title_fullStr Expanding the generality of neural fields
title_full_unstemmed Expanding the generality of neural fields
title_sort expanding the generality of neural fields
publisher Nanyang Technological University
publishDate 2025
url https://hdl.handle.net/10356/182229
_version_ 1821833193092808704
spelling sg-ntu-dr.10356-1822292025-01-16T01:09:39Z Expanding the generality of neural fields Lan, Yushi Chen Change Loy College of Computing and Data Science ccloy@ntu.edu.sg Computer and Information Science Computer vision Computer graphics Neural 3D fields Neural fields have emerged as a groundbreaking approach to representing 3D shapes, garnering significant attention due to their compatibility with modern deep-learning techniques. Neural fields, which parameterize physical properties of scenes or objects across space and time, have achieved remarkable success in tasks such as 3D shape and image synthesis, animation, 3D reconstruction, and pose estimation. However, these promising results are predominantly achieved by overfitting on individual scenes or objects. That is, the research on generalizable neural fields has been largely overlooked. This limitation hinders neural fields’ potential applications in downstream tasks, e.g., single-view 3D reconstruction, 3D object generation, and editing. To address this issue, this thesis advances generalizable neural field methods, including generalizable neural field algorithms, which improve training methodologies, and generalizable neural field representations, which propose novel 3D representations for specific applications. These methods demonstrate significant potential for developing solutions that are effective, robust, and practical. For generalizable neural field algorithms, correspondence-level generalization is first explored. Unlike explicit shape representations, e.g., meshes, it remains an open problem to establish dense correspondences across Neural Radiance Fields (NeRFs) of the same category. This problem is particularly challenging due to the implicit nature of NeRFs and the lack of ground-truth correspondence annotations. This thesis shows that these challenges can be addressed by leveraging the rich semantics and structural priors encapsulated in pre-trained NeRF-based GANs. Specifically, three key innovations are introduced: 1) a dual deformation field guided by latent codes as global structural indicators, 2) a learning objective that uses generator features as geometry-aware local descriptors, and 3) a method for generating infinite object-specific NeRF samples. Experiments demonstrate that these innovations enable accurate, smooth, and robust 3D dense correspondences, facilitating downstream applications such as texture transfer. To bridge the gap to real-world scenarios, this thesis further explores object-level generalization for neural fields. Specifically, this thesis proposes E3DGE, a framework addressing the challenge of 3D GAN inversion—predicting a latent code from a single 2D image to faithfully recover 3D shapes and textures. The inherent ill-posed nature of the problem, coupled with the limited capacity of global latent codes, presents significant challenges. To overcome these challenges, this thesis introduces an efficient self-training scheme that does not rely on real-world 2D-3D pairs but instead utilizes proxy samples generated from a 3D GAN. Additionally, the proposed approach enhances the generation network with a local branch that incorporates pixel-aligned features to accurately reconstruct texture details. Furthermore, a novel pipeline for 3D view-consistent editing is introduced. The efficacy of the proposed method is validated on two representative 3D GANs, namely StyleSDF and EG3D. Extensive experiments demonstrate that the proposed approach consistently outperforms state-of-the-art inversion methods, delivering superior quality in both shape and texture reconstruction. For generalizable neural field representations, this thesis first investigates neural field representations for generalizable 3D avatar heads. A novel framework is presented for generating photorealistic 3D human heads and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach constructs an implicit representation of 3D human heads, anchored on a parametric face model. To enhance representational capabilities and encode spatial information, the semantically consistent head region is represented by a local tri-plane modulated by a 3D Gaussian. Additionally, these tri-planes are further parameterized in a 2D UV space via a 3DMM, enabling effective utilization of the diffusion model for 3D head avatar generation. The proposed method facilitates the creation of diverse and realistic 3D human heads with flexible global and fine-grained region-based editing over facial structures, appearance, and expressions. Extensive experiments demonstrate the effectiveness of the proposed method. Finally, this thesis focuses on designing neural field representations for general 3D objects and unifying the 3D diffusion pipeline within the latent diffusion paradigm. Specifically, a novel framework called LN3Diff is proposed to enable fast, high-quality, and generic conditional 3D generation. The proposed method employs a 3D-aware architecture and variational autoencoder (VAE) to encode the input image into a structured, compact, 3D latent space. The latent is decoded by a transformer-based decoder into a high-capacity 3D neural field. By training a diffusion model on this 3D-aware latent space, the proposed method achieves state-of the-art performance on ShapeNet for 3D generation. It also demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation across various datasets. Moreover, it surpasses existing 3D diffusion methods in inference speed, requiring no per-instance optimization. The proposed LN3Diff represents a significant advancement in 3D generative modeling and holds promise for diverse applications in 3D vision and graphics tasks. Doctor of Philosophy 2025-01-16T01:09:38Z 2025-01-16T01:09:38Z 2025 Thesis-Doctor of Philosophy Lan, Y. (2025). Expanding the generality of neural fields. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182229 https://hdl.handle.net/10356/182229 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University