Image synthesis in visual machine learning

Image synthesis aims to generate realistic and high-fidelity images automatically. It has attracted increasing interests from both academia and industrial communities in recent years due to its wide applications in various artificial intelligence (AI) tasks as well as the recent advance of generativ...

Full description

Saved in:
Bibliographic Details
Main Author: Zhan, Fangneng
Other Authors: Lu Shijian
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/148667
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-148667
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Zhan, Fangneng
Image synthesis in visual machine learning
description Image synthesis aims to generate realistic and high-fidelity images automatically. It has attracted increasing interests from both academia and industrial communities in recent years due to its wide applications in various artificial intelligence (AI) tasks as well as the recent advance of generative adversarial networks (GANs). Specifically, image synthesis could generate realistic images of different objects and scenes, hence forms one fundamental component in various design tasks for automated generation of artworks, fashion, advertisement posters, etc. In addition, image synthesis could generate self-annotated images which can be directly applied to deep model training, hence alleviate the data constraint as deep neural networks usually require large amounts of annotated training images that are expensive and time-consuming to collect manually. However, automated generation of realistic and self-annotated images is still facing two major challenges. First, the synthesis realism requires the presence of both image appearance (e.g. image colors, brightness, styles, etc.) and image geometry (e.g. object sizes, alignment, perspective, etc.). Second, generating self-annotated and useful training images is still an open research topic and most existing generation networks cannot handle it well due to the lack of diversity in their generated images. We investigate automated image synthesis that aims to generate self-annotated and realistic images for either visual design tasks or effective training of deep neural networks. Our works can be broadly grouped into three parts. The first part is composition-based image synthesis that generates realistic and self-annotated images by embedding foreground object into background images automatically. Unlike most existing GAN-based generation networks, it can generate new images with superior diversity as well as new information as the foreground object and background image could come from different sources with completely different distributions. We developed three novel image composition techniques to tackle the challenge of composition-based synthesis that requires to embed foreground objects at the right locations with the right appearance and geometry within the background image automatically. The second part is translation-based image synthesis that aims to modify existing images to certain new forms that are more useful in deep network training. Unlike most existing image-to-image translation networks that focus on adaptation of image styles and appearance, we developed two image translation networks that adapt image geometries including global image viewpoints and local instance-level object shapes, respectively. The challenge is how to design networks to estimate reliable geometric transformation that often greatly affects the geometry of the translated images. The third part focuses on 3-dimensional (3D) image synthesis, a new problem that aims to embed 3D virtual object models into 2-dimensional (2D) natural images realistically. To ensure that the embedded 3D object models have realistic appearance, we design novel networks that first estimate the environmental lighting of the natural images and then re-light (or render) the embedded 3D objects to have harmonious brightness, color, shadows, etc. with respect to the natural images. Extensive experiments with different evaluation metrics show that our proposed synthesis networks can generate images with superior fidelity and realism. In addition, we demonstrate that our synthesized images are self-annotated and can be directly applied to train deep neural networks for various computer vision tasks effectively.
author2 Lu Shijian
author_facet Lu Shijian
Zhan, Fangneng
format Thesis-Doctor of Philosophy
author Zhan, Fangneng
author_sort Zhan, Fangneng
title Image synthesis in visual machine learning
title_short Image synthesis in visual machine learning
title_full Image synthesis in visual machine learning
title_fullStr Image synthesis in visual machine learning
title_full_unstemmed Image synthesis in visual machine learning
title_sort image synthesis in visual machine learning
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/148667
_version_ 1705151332815470592
spelling sg-ntu-dr.10356-1486672021-07-08T16:00:36Z Image synthesis in visual machine learning Zhan, Fangneng Lu Shijian School of Computer Science and Engineering Shijian.Lu@ntu.edu.sg Engineering::Computer science and engineering Image synthesis aims to generate realistic and high-fidelity images automatically. It has attracted increasing interests from both academia and industrial communities in recent years due to its wide applications in various artificial intelligence (AI) tasks as well as the recent advance of generative adversarial networks (GANs). Specifically, image synthesis could generate realistic images of different objects and scenes, hence forms one fundamental component in various design tasks for automated generation of artworks, fashion, advertisement posters, etc. In addition, image synthesis could generate self-annotated images which can be directly applied to deep model training, hence alleviate the data constraint as deep neural networks usually require large amounts of annotated training images that are expensive and time-consuming to collect manually. However, automated generation of realistic and self-annotated images is still facing two major challenges. First, the synthesis realism requires the presence of both image appearance (e.g. image colors, brightness, styles, etc.) and image geometry (e.g. object sizes, alignment, perspective, etc.). Second, generating self-annotated and useful training images is still an open research topic and most existing generation networks cannot handle it well due to the lack of diversity in their generated images. We investigate automated image synthesis that aims to generate self-annotated and realistic images for either visual design tasks or effective training of deep neural networks. Our works can be broadly grouped into three parts. The first part is composition-based image synthesis that generates realistic and self-annotated images by embedding foreground object into background images automatically. Unlike most existing GAN-based generation networks, it can generate new images with superior diversity as well as new information as the foreground object and background image could come from different sources with completely different distributions. We developed three novel image composition techniques to tackle the challenge of composition-based synthesis that requires to embed foreground objects at the right locations with the right appearance and geometry within the background image automatically. The second part is translation-based image synthesis that aims to modify existing images to certain new forms that are more useful in deep network training. Unlike most existing image-to-image translation networks that focus on adaptation of image styles and appearance, we developed two image translation networks that adapt image geometries including global image viewpoints and local instance-level object shapes, respectively. The challenge is how to design networks to estimate reliable geometric transformation that often greatly affects the geometry of the translated images. The third part focuses on 3-dimensional (3D) image synthesis, a new problem that aims to embed 3D virtual object models into 2-dimensional (2D) natural images realistically. To ensure that the embedded 3D object models have realistic appearance, we design novel networks that first estimate the environmental lighting of the natural images and then re-light (or render) the embedded 3D objects to have harmonious brightness, color, shadows, etc. with respect to the natural images. Extensive experiments with different evaluation metrics show that our proposed synthesis networks can generate images with superior fidelity and realism. In addition, we demonstrate that our synthesized images are self-annotated and can be directly applied to train deep neural networks for various computer vision tasks effectively. Doctor of Philosophy 2021-05-06T03:29:36Z 2021-05-06T03:29:36Z 2021 Thesis-Doctor of Philosophy Zhan, F. (2021). Image synthesis in visual machine learning. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/148667 https://hdl.handle.net/10356/148667 10.32657/10356/148667 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University