Image synthesis in visual machine learning

Image synthesis aims to generate realistic and high-fidelity images automatically. It has attracted increasing interests from both academia and industrial communities in recent years due to its wide applications in various artificial intelligence (AI) tasks as well as the recent advance of generativ...

Full description

Saved in:
Bibliographic Details
Main Author: Zhan, Fangneng
Other Authors: Lu Shijian
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/148667
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Image synthesis aims to generate realistic and high-fidelity images automatically. It has attracted increasing interests from both academia and industrial communities in recent years due to its wide applications in various artificial intelligence (AI) tasks as well as the recent advance of generative adversarial networks (GANs). Specifically, image synthesis could generate realistic images of different objects and scenes, hence forms one fundamental component in various design tasks for automated generation of artworks, fashion, advertisement posters, etc. In addition, image synthesis could generate self-annotated images which can be directly applied to deep model training, hence alleviate the data constraint as deep neural networks usually require large amounts of annotated training images that are expensive and time-consuming to collect manually. However, automated generation of realistic and self-annotated images is still facing two major challenges. First, the synthesis realism requires the presence of both image appearance (e.g. image colors, brightness, styles, etc.) and image geometry (e.g. object sizes, alignment, perspective, etc.). Second, generating self-annotated and useful training images is still an open research topic and most existing generation networks cannot handle it well due to the lack of diversity in their generated images. We investigate automated image synthesis that aims to generate self-annotated and realistic images for either visual design tasks or effective training of deep neural networks. Our works can be broadly grouped into three parts. The first part is composition-based image synthesis that generates realistic and self-annotated images by embedding foreground object into background images automatically. Unlike most existing GAN-based generation networks, it can generate new images with superior diversity as well as new information as the foreground object and background image could come from different sources with completely different distributions. We developed three novel image composition techniques to tackle the challenge of composition-based synthesis that requires to embed foreground objects at the right locations with the right appearance and geometry within the background image automatically. The second part is translation-based image synthesis that aims to modify existing images to certain new forms that are more useful in deep network training. Unlike most existing image-to-image translation networks that focus on adaptation of image styles and appearance, we developed two image translation networks that adapt image geometries including global image viewpoints and local instance-level object shapes, respectively. The challenge is how to design networks to estimate reliable geometric transformation that often greatly affects the geometry of the translated images. The third part focuses on 3-dimensional (3D) image synthesis, a new problem that aims to embed 3D virtual object models into 2-dimensional (2D) natural images realistically. To ensure that the embedded 3D object models have realistic appearance, we design novel networks that first estimate the environmental lighting of the natural images and then re-light (or render) the embedded 3D objects to have harmonious brightness, color, shadows, etc. with respect to the natural images. Extensive experiments with different evaluation metrics show that our proposed synthesis networks can generate images with superior fidelity and realism. In addition, we demonstrate that our synthesized images are self-annotated and can be directly applied to train deep neural networks for various computer vision tasks effectively.