Architecture and algorithm development for generative adversarial networks

Generative Adversarial Networks (GANs) are recently invented generative models which can produce highly realistic samples such as images. In GANs framework, a generator and a discriminator play a game against each other. The discriminator aims to find differences between the data distribution and th...

Full description

Saved in:
Bibliographic Details
Main Author: Yazici, Yasin
Other Authors: Yap Kim Hui
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/142989
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-142989
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Yazici, Yasin
Architecture and algorithm development for generative adversarial networks
description Generative Adversarial Networks (GANs) are recently invented generative models which can produce highly realistic samples such as images. In GANs framework, a generator and a discriminator play a game against each other. The discriminator aims to find differences between the data distribution and the generator's distribution, while the generator aims to trick the discriminator. Both players update their own parameters, in an iterative manner, to satisfy their own objective. By playing this game, the generator learns to produce samples that are similar to the ones in the data distribution. Despite the success of GANs, there are various problems with them such as non-convergence (instability) of the objective function, mode drop (not modeling certain parts of the distribution), lack of efficient multi-distribution modeling, and generation quality. In this thesis, we have proposed novel algorithms, architectures and provided theoretical/empirical analysis to mitigate these problems. In the first work of the thesis, we address the question of how to model common and unique aspects of multiple distributions. Finding these aspects can be used for a subsequent task e.g. what can be found in distribution A but not in distribution B? We propose a new GANs design called VennGAN which models multiple distributions effectively and discovers their commonalities and particularities. Each data distribution is modeled with a mixture of multiple generator distributions. The generators are partially shared between the modeling of different data distributions: shared ones capture the commonality of the distributions, while non-shared ones capture unique aspects of them. We thoroughly evaluate our proposed VennGAN and show its effectiveness on various datasets (MNIST, Fashion-MNIST, CIFAR-10, CelebA) and settings with compelling results. Additionally, we illustrate how this method can be used to incorporate external (prior) knowledge to distribution modeling. In the second work, we theoretically and empirically examine two different techniques for parameter averaging in GANs training in order to mitigate non-convergence of the objective function. Moving Average (MA) computes the time-average of the network parameters, whereas Exponential Moving Average (EMA) computes an exponentially discounted sum of the network parameters. Whilst MA is known to lead to convergence in bilinear settings, to the best of our knowledge, we provide the first theoretical analysis in support of EMA. We show that EMA shrinks the cycles around the equilibrium for simple bilinear games and also enhances the stability of general GANs training. We establish experimentally that both techniques are strikingly effective in the general (non-convex-concave) GANs setting as well. Both improve commonly used Inception Score and Frechet Inception Distance on different architectures and for different GANs objectives. We provide comprehensive experimental results across a range of datasets, mixture of Gaussians, CIFAR-10, STL-10, CelebA and ImageNet, to demonstrate its effectiveness. Also, we have applied EMA into image in-painting problem which results in smoother, more realistic images with less artifacts. In the third work, we have studied important questions in GANs literature such as overfitting, mode drop and non-convergence from empirical perspective on high dimensional distributions. Mode drop is phenomenon of leaving out certain modes of the data distribution. Overfitting has important implications for privacy concerns as the generator can faithfully memorize the training samples which might be a private data. While mitigating mode drop and non-convergence leads to good distribution modeling. We have shown that, given enough capacity, when stochasticity is removed from the training (optimization), GANs can overfit, show almost no mode drop and reduce non-convergence rates significantly. These results shed a light on important characteristics of GANs training and dismiss certain beliefs in the current literature such as GANs do not memorize the training set, mode drop and non-convergence are mainly due to theoretical characteristic of GANs objective rather than an optimization issue. In the fourth work, we aim to improve the quality of image generation with GANs by refining the discriminator architecture and its objective. We have proposed a novel GAN architecture \& objective, called Autoregressive GANs (ARGAN), and successfully trained it on CIFAR-10, STL-10, and CelebA datasets. Different from original GANs setting, our method models the latent distribution of real samples to learn feature co-occurrences implicitly. Through efficient factorization, we come up with two version of our model, namely Spatial-wise ARGAN and Channel-wise ARGAN. Further, we have combined our proposed Spatial-wise ARGAN with Patch-GAN to further improve the stability of training and the quality of image generation. Our results show that the proposed model outperforms DCGAN, EBGAN, and WGAN on the same architecture.
author2 Yap Kim Hui
author_facet Yap Kim Hui
Yazici, Yasin
format Thesis-Doctor of Philosophy
author Yazici, Yasin
author_sort Yazici, Yasin
title Architecture and algorithm development for generative adversarial networks
title_short Architecture and algorithm development for generative adversarial networks
title_full Architecture and algorithm development for generative adversarial networks
title_fullStr Architecture and algorithm development for generative adversarial networks
title_full_unstemmed Architecture and algorithm development for generative adversarial networks
title_sort architecture and algorithm development for generative adversarial networks
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/142989
_version_ 1772826895514075136
spelling sg-ntu-dr.10356-1429892023-07-04T17:20:46Z Architecture and algorithm development for generative adversarial networks Yazici, Yasin Yap Kim Hui School of Electrical and Electronic Engineering Research Techno Plaza EKHYap@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Generative Adversarial Networks (GANs) are recently invented generative models which can produce highly realistic samples such as images. In GANs framework, a generator and a discriminator play a game against each other. The discriminator aims to find differences between the data distribution and the generator's distribution, while the generator aims to trick the discriminator. Both players update their own parameters, in an iterative manner, to satisfy their own objective. By playing this game, the generator learns to produce samples that are similar to the ones in the data distribution. Despite the success of GANs, there are various problems with them such as non-convergence (instability) of the objective function, mode drop (not modeling certain parts of the distribution), lack of efficient multi-distribution modeling, and generation quality. In this thesis, we have proposed novel algorithms, architectures and provided theoretical/empirical analysis to mitigate these problems. In the first work of the thesis, we address the question of how to model common and unique aspects of multiple distributions. Finding these aspects can be used for a subsequent task e.g. what can be found in distribution A but not in distribution B? We propose a new GANs design called VennGAN which models multiple distributions effectively and discovers their commonalities and particularities. Each data distribution is modeled with a mixture of multiple generator distributions. The generators are partially shared between the modeling of different data distributions: shared ones capture the commonality of the distributions, while non-shared ones capture unique aspects of them. We thoroughly evaluate our proposed VennGAN and show its effectiveness on various datasets (MNIST, Fashion-MNIST, CIFAR-10, CelebA) and settings with compelling results. Additionally, we illustrate how this method can be used to incorporate external (prior) knowledge to distribution modeling. In the second work, we theoretically and empirically examine two different techniques for parameter averaging in GANs training in order to mitigate non-convergence of the objective function. Moving Average (MA) computes the time-average of the network parameters, whereas Exponential Moving Average (EMA) computes an exponentially discounted sum of the network parameters. Whilst MA is known to lead to convergence in bilinear settings, to the best of our knowledge, we provide the first theoretical analysis in support of EMA. We show that EMA shrinks the cycles around the equilibrium for simple bilinear games and also enhances the stability of general GANs training. We establish experimentally that both techniques are strikingly effective in the general (non-convex-concave) GANs setting as well. Both improve commonly used Inception Score and Frechet Inception Distance on different architectures and for different GANs objectives. We provide comprehensive experimental results across a range of datasets, mixture of Gaussians, CIFAR-10, STL-10, CelebA and ImageNet, to demonstrate its effectiveness. Also, we have applied EMA into image in-painting problem which results in smoother, more realistic images with less artifacts. In the third work, we have studied important questions in GANs literature such as overfitting, mode drop and non-convergence from empirical perspective on high dimensional distributions. Mode drop is phenomenon of leaving out certain modes of the data distribution. Overfitting has important implications for privacy concerns as the generator can faithfully memorize the training samples which might be a private data. While mitigating mode drop and non-convergence leads to good distribution modeling. We have shown that, given enough capacity, when stochasticity is removed from the training (optimization), GANs can overfit, show almost no mode drop and reduce non-convergence rates significantly. These results shed a light on important characteristics of GANs training and dismiss certain beliefs in the current literature such as GANs do not memorize the training set, mode drop and non-convergence are mainly due to theoretical characteristic of GANs objective rather than an optimization issue. In the fourth work, we aim to improve the quality of image generation with GANs by refining the discriminator architecture and its objective. We have proposed a novel GAN architecture \& objective, called Autoregressive GANs (ARGAN), and successfully trained it on CIFAR-10, STL-10, and CelebA datasets. Different from original GANs setting, our method models the latent distribution of real samples to learn feature co-occurrences implicitly. Through efficient factorization, we come up with two version of our model, namely Spatial-wise ARGAN and Channel-wise ARGAN. Further, we have combined our proposed Spatial-wise ARGAN with Patch-GAN to further improve the stability of training and the quality of image generation. Our results show that the proposed model outperforms DCGAN, EBGAN, and WGAN on the same architecture. Doctor of Philosophy 2020-07-17T07:02:55Z 2020-07-17T07:02:55Z 2020 Thesis-Doctor of Philosophy Yazici, Y. (2020). Architecture and algorithm development for generative adversarial networks. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/142989 10.32657/10356/142989 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University