Reconstruction and manipulation of portraits
The large growth in consumer digital cameras and smart phones has led to prevalent use of digital photography. One typical category is portraiture, where a photograph is taken of a person's face or upper body. Analysis of portraits for 3D reconstruction and further manipulation has received a l...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/155400 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-155400 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Song, Guoxian Reconstruction and manipulation of portraits |
description |
The large growth in consumer digital cameras and smart phones has led to prevalent use of digital photography. One typical category is portraiture, where a photograph is taken of a person's face or upper body. Analysis of portraits for 3D reconstruction and further manipulation has received a lot of attention, as it is highly relevant to Virtual Reality (VR) and Augmented Reality (AR), as well as other forms of entertainment. This thesis presents my research contributions to image-based facial analysis, involving 3D facial geometry estimation and facial reflectance inference as well as advanced manipulation techniques for portrait relighting, shadow generation and portrait stylization.
Part 1 describes methods for portrait reconstruction. In particular, two frameworks are presented that involve the well-known 3D Morphable Model representation. The first framework targets 3D face-eye performance capture under extreme occlusion, while the second work handles reconstruction of facial reflectance maps and geometry for faces with significant specular reflections. More specifically, in Chapter 3, I present a CNN-based 3D face-eye capture system for users wearing head mounted displays (HMD) users. Our system integrates a 3D parametric gaze model into the 3D morphable face model, and can be used to produce a digital personalized avatar given an exterior RGB image of a user's face occluded by an HMD and an infrared (IR) eye image from the interior of the HMD, with no calibration needed. Moreover, to train the facial and eye gaze neural networks, we collect face and VR IR eye data from multiple subjects, and synthesize pairs of HMD face data with expression labels. In Chapter 4, I describe a model-based method to recover photorealistic facial reflectance and geometry from two video streams of a subject in two views. After estimating initial facial geometry and texture map, the framework then jointly infers specular and diffuse reflectance components, with further refinement of geometry. This leads to significant improvement over prior art. By allowing for better reconstruction of the shape of faces with specular reflections, leading to more compelling rendering of faces with specular effects can be made under new viewpoints.
Part 2 describes approaches for portrait manipulation. In particular, three frameworks are presented that involve referred portrait neural relighting, shadow-aware portrait relighting for virtual background, and portrait stylization using limited exemplars. More specifically, in Chapter 5, I present an image-based deep generative model that can dynamically relight half-body portrait images. Key technical contributions include the proposed over-complete lighting representation, the multiplicative neural rendering, and the separation of background and foreground for illumination feature encoding. We have also created a large rendered dataset with annotated and controlled lighting that is suitable for training our model, and which has sufficient photorealism to allow our model to be directly applied to real images. In Chapter 6, I present a new shadow-aware portrait relighting system that can relight an input portrait to be consistent with a given desired background image, including perceptually important shadow effects. Our system consists of four major components: portrait neutralization, illumination estimation, shadow generation and hierarchical neural rendering, which are all based on deep neural networks, with the whole system being end-to-end trainable. The extensive experiments demonstrate that our shadow-aware relighting system outperforms state-of-the-art portrait relighting methods in terms of producing more lighting-consistent images with shadow effects. In Chapter 7, I present AgileGAN, a framework that can generate high quality stylistic portraits via inversion-consistent transfer learning. We propose a novel hierarchical variational autoencoder to ensure the inverse mapped distribution conforms to the original latent Gaussian distribution of the well-known StyleGAN model, while augmenting its original space to a multi-resolution latent space so as to better encode different levels of detail. We show that we can achieve superior portrait stylization quality to previous state-of-the-art methods, with comparisons done qualitatively, quantitatively and through a perceptual user study. |
author2 |
Cham Tat Jen |
author_facet |
Cham Tat Jen Song, Guoxian |
format |
Thesis-Doctor of Philosophy |
author |
Song, Guoxian |
author_sort |
Song, Guoxian |
title |
Reconstruction and manipulation of portraits |
title_short |
Reconstruction and manipulation of portraits |
title_full |
Reconstruction and manipulation of portraits |
title_fullStr |
Reconstruction and manipulation of portraits |
title_full_unstemmed |
Reconstruction and manipulation of portraits |
title_sort |
reconstruction and manipulation of portraits |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/155400 |
_version_ |
1726885530869694464 |
spelling |
sg-ntu-dr.10356-1554002022-03-06T05:18:16Z Reconstruction and manipulation of portraits Song, Guoxian Cham Tat Jen School of Computer Science and Engineering Singtel Cognitive and Artificial Intelligence Lab ASTJCham@ntu.edu.sg Engineering::Computer science and engineering The large growth in consumer digital cameras and smart phones has led to prevalent use of digital photography. One typical category is portraiture, where a photograph is taken of a person's face or upper body. Analysis of portraits for 3D reconstruction and further manipulation has received a lot of attention, as it is highly relevant to Virtual Reality (VR) and Augmented Reality (AR), as well as other forms of entertainment. This thesis presents my research contributions to image-based facial analysis, involving 3D facial geometry estimation and facial reflectance inference as well as advanced manipulation techniques for portrait relighting, shadow generation and portrait stylization. Part 1 describes methods for portrait reconstruction. In particular, two frameworks are presented that involve the well-known 3D Morphable Model representation. The first framework targets 3D face-eye performance capture under extreme occlusion, while the second work handles reconstruction of facial reflectance maps and geometry for faces with significant specular reflections. More specifically, in Chapter 3, I present a CNN-based 3D face-eye capture system for users wearing head mounted displays (HMD) users. Our system integrates a 3D parametric gaze model into the 3D morphable face model, and can be used to produce a digital personalized avatar given an exterior RGB image of a user's face occluded by an HMD and an infrared (IR) eye image from the interior of the HMD, with no calibration needed. Moreover, to train the facial and eye gaze neural networks, we collect face and VR IR eye data from multiple subjects, and synthesize pairs of HMD face data with expression labels. In Chapter 4, I describe a model-based method to recover photorealistic facial reflectance and geometry from two video streams of a subject in two views. After estimating initial facial geometry and texture map, the framework then jointly infers specular and diffuse reflectance components, with further refinement of geometry. This leads to significant improvement over prior art. By allowing for better reconstruction of the shape of faces with specular reflections, leading to more compelling rendering of faces with specular effects can be made under new viewpoints. Part 2 describes approaches for portrait manipulation. In particular, three frameworks are presented that involve referred portrait neural relighting, shadow-aware portrait relighting for virtual background, and portrait stylization using limited exemplars. More specifically, in Chapter 5, I present an image-based deep generative model that can dynamically relight half-body portrait images. Key technical contributions include the proposed over-complete lighting representation, the multiplicative neural rendering, and the separation of background and foreground for illumination feature encoding. We have also created a large rendered dataset with annotated and controlled lighting that is suitable for training our model, and which has sufficient photorealism to allow our model to be directly applied to real images. In Chapter 6, I present a new shadow-aware portrait relighting system that can relight an input portrait to be consistent with a given desired background image, including perceptually important shadow effects. Our system consists of four major components: portrait neutralization, illumination estimation, shadow generation and hierarchical neural rendering, which are all based on deep neural networks, with the whole system being end-to-end trainable. The extensive experiments demonstrate that our shadow-aware relighting system outperforms state-of-the-art portrait relighting methods in terms of producing more lighting-consistent images with shadow effects. In Chapter 7, I present AgileGAN, a framework that can generate high quality stylistic portraits via inversion-consistent transfer learning. We propose a novel hierarchical variational autoencoder to ensure the inverse mapped distribution conforms to the original latent Gaussian distribution of the well-known StyleGAN model, while augmenting its original space to a multi-resolution latent space so as to better encode different levels of detail. We show that we can achieve superior portrait stylization quality to previous state-of-the-art methods, with comparisons done qualitatively, quantitatively and through a perceptual user study. Doctor of Philosophy 2022-02-23T05:59:20Z 2022-02-23T05:59:20Z 2022 Thesis-Doctor of Philosophy Song, G. (2022). Reconstruction and manipulation of portraits. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/155400 https://hdl.handle.net/10356/155400 10.32657/10356/155400 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |