Reconstruction and manipulation of portraits

The large growth in consumer digital cameras and smart phones has led to prevalent use of digital photography. One typical category is portraiture, where a photograph is taken of a person's face or upper body. Analysis of portraits for 3D reconstruction and further manipulation has received a l...

Full description

Saved in:

Bibliographic Details
Main Author:	Song, Guoxian
Other Authors:	Cham Tat Jen
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/155400
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-155400
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Song, Guoxian Reconstruction and manipulation of portraits
description	The large growth in consumer digital cameras and smart phones has led to prevalent use of digital photography. One typical category is portraiture, where a photograph is taken of a person's face or upper body. Analysis of portraits for 3D reconstruction and further manipulation has received a lot of attention, as it is highly relevant to Virtual Reality (VR) and Augmented Reality (AR), as well as other forms of entertainment. This thesis presents my research contributions to image-based facial analysis, involving 3D facial geometry estimation and facial reflectance inference as well as advanced manipulation techniques for portrait relighting, shadow generation and portrait stylization. Part 1 describes methods for portrait reconstruction. In particular, two frameworks are presented that involve the well-known 3D Morphable Model representation. The first framework targets 3D face-eye performance capture under extreme occlusion, while the second work handles reconstruction of facial reflectance maps and geometry for faces with significant specular reflections. More specifically, in Chapter 3, I present a CNN-based 3D face-eye capture system for users wearing head mounted displays (HMD) users. Our system integrates a 3D parametric gaze model into the 3D morphable face model, and can be used to produce a digital personalized avatar given an exterior RGB image of a user's face occluded by an HMD and an infrared (IR) eye image from the interior of the HMD, with no calibration needed. Moreover, to train the facial and eye gaze neural networks, we collect face and VR IR eye data from multiple subjects, and synthesize pairs of HMD face data with expression labels. In Chapter 4, I describe a model-based method to recover photorealistic facial reflectance and geometry from two video streams of a subject in two views. After estimating initial facial geometry and texture map, the framework then jointly infers specular and diffuse reflectance components, with further refinement of geometry. This leads to significant improvement over prior art. By allowing for better reconstruction of the shape of faces with specular reflections, leading to more compelling rendering of faces with specular effects can be made under new viewpoints. Part 2 describes approaches for portrait manipulation. In particular, three frameworks are presented that involve referred portrait neural relighting, shadow-aware portrait relighting for virtual background, and portrait stylization using limited exemplars. More specifically, in Chapter 5, I present an image-based deep generative model that can dynamically relight half-body portrait images. Key technical contributions include the proposed over-complete lighting representation, the multiplicative neural rendering, and the separation of background and foreground for illumination feature encoding. We have also created a large rendered dataset with annotated and controlled lighting that is suitable for training our model, and which has sufficient photorealism to allow our model to be directly applied to real images. In Chapter 6, I present a new shadow-aware portrait relighting system that can relight an input portrait to be consistent with a given desired background image, including perceptually important shadow effects. Our system consists of four major components: portrait neutralization, illumination estimation, shadow generation and hierarchical neural rendering, which are all based on deep neural networks, with the whole system being end-to-end trainable. The extensive experiments demonstrate that our shadow-aware relighting system outperforms state-of-the-art portrait relighting methods in terms of producing more lighting-consistent images with shadow effects. In Chapter 7, I present AgileGAN, a framework that can generate high quality stylistic portraits via inversion-consistent transfer learning. We propose a novel hierarchical variational autoencoder to ensure the inverse mapped distribution conforms to the original latent Gaussian distribution of the well-known StyleGAN model, while augmenting its original space to a multi-resolution latent space so as to better encode different levels of detail. We show that we can achieve superior portrait stylization quality to previous state-of-the-art methods, with comparisons done qualitatively, quantitatively and through a perceptual user study.
author2	Cham Tat Jen
author_facet	Cham Tat Jen Song, Guoxian
format	Thesis-Doctor of Philosophy
author	Song, Guoxian
author_sort	Song, Guoxian
title	Reconstruction and manipulation of portraits
title_short	Reconstruction and manipulation of portraits
title_full	Reconstruction and manipulation of portraits
title_fullStr	Reconstruction and manipulation of portraits
title_full_unstemmed	Reconstruction and manipulation of portraits
title_sort	reconstruction and manipulation of portraits
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/155400
_version_	1726885530869694464
spelling	sg-ntu-dr.10356-1554002022-03-06T05:18:16Z Reconstruction and manipulation of portraits Song, Guoxian Cham Tat Jen School of Computer Science and Engineering Singtel Cognitive and Artificial Intelligence Lab ASTJCham@ntu.edu.sg Engineering::Computer science and engineering The large growth in consumer digital cameras and smart phones has led to prevalent use of digital photography. One typical category is portraiture, where a photograph is taken of a person's face or upper body. Analysis of portraits for 3D reconstruction and further manipulation has received a lot of attention, as it is highly relevant to Virtual Reality (VR) and Augmented Reality (AR), as well as other forms of entertainment. This thesis presents my research contributions to image-based facial analysis, involving 3D facial geometry estimation and facial reflectance inference as well as advanced manipulation techniques for portrait relighting, shadow generation and portrait stylization. Part 1 describes methods for portrait reconstruction. In particular, two frameworks are presented that involve the well-known 3D Morphable Model representation. The first framework targets 3D face-eye performance capture under extreme occlusion, while the second work handles reconstruction of facial reflectance maps and geometry for faces with significant specular reflections. More specifically, in Chapter 3, I present a CNN-based 3D face-eye capture system for users wearing head mounted displays (HMD) users. Our system integrates a 3D parametric gaze model into the 3D morphable face model, and can be used to produce a digital personalized avatar given an exterior RGB image of a user's face occluded by an HMD and an infrared (IR) eye image from the interior of the HMD, with no calibration needed. Moreover, to train the facial and eye gaze neural networks, we collect face and VR IR eye data from multiple subjects, and synthesize pairs of HMD face data with expression labels. In Chapter 4, I describe a model-based method to recover photorealistic facial reflectance and geometry from two video streams of a subject in two views. After estimating initial facial geometry and texture map, the framework then jointly infers specular and diffuse reflectance components, with further refinement of geometry. This leads to significant improvement over prior art. By allowing for better reconstruction of the shape of faces with specular reflections, leading to more compelling rendering of faces with specular effects can be made under new viewpoints. Part 2 describes approaches for portrait manipulation. In particular, three frameworks are presented that involve referred portrait neural relighting, shadow-aware portrait relighting for virtual background, and portrait stylization using limited exemplars. More specifically, in Chapter 5, I present an image-based deep generative model that can dynamically relight half-body portrait images. Key technical contributions include the proposed over-complete lighting representation, the multiplicative neural rendering, and the separation of background and foreground for illumination feature encoding. We have also created a large rendered dataset with annotated and controlled lighting that is suitable for training our model, and which has sufficient photorealism to allow our model to be directly applied to real images. In Chapter 6, I present a new shadow-aware portrait relighting system that can relight an input portrait to be consistent with a given desired background image, including perceptually important shadow effects. Our system consists of four major components: portrait neutralization, illumination estimation, shadow generation and hierarchical neural rendering, which are all based on deep neural networks, with the whole system being end-to-end trainable. The extensive experiments demonstrate that our shadow-aware relighting system outperforms state-of-the-art portrait relighting methods in terms of producing more lighting-consistent images with shadow effects. In Chapter 7, I present AgileGAN, a framework that can generate high quality stylistic portraits via inversion-consistent transfer learning. We propose a novel hierarchical variational autoencoder to ensure the inverse mapped distribution conforms to the original latent Gaussian distribution of the well-known StyleGAN model, while augmenting its original space to a multi-resolution latent space so as to better encode different levels of detail. We show that we can achieve superior portrait stylization quality to previous state-of-the-art methods, with comparisons done qualitatively, quantitatively and through a perceptual user study. Doctor of Philosophy 2022-02-23T05:59:20Z 2022-02-23T05:59:20Z 2022 Thesis-Doctor of Philosophy Song, G. (2022). Reconstruction and manipulation of portraits. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/155400 https://hdl.handle.net/10356/155400 10.32657/10356/155400 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Reconstruction and manipulation of portraits

Similar Items