Effective image synthesis for effective deep neural network training
State-of-the-art deep neural networks (DNNs) necessitate a significant number of images to achieve accurate and robust models. However, gathering a large amount of images remains the prevailing approach, which proves expensive, time-consuming, and challenging to scale across different tasks and d...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/174934 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | State-of-the-art deep neural networks (DNNs) necessitate a significant number of
images to achieve accurate and robust models. However, gathering a large amount
of images remains the prevailing approach, which proves expensive, time-consuming,
and challenging to scale across different tasks and domains. To address this issue,
a solution called data-limited image generation has been proposed. The primary
concept behind this approach is to automatically generate valuable and effective
images specifically for training purposes.
In this thesis, we handle data-limited image generation from three very different
perspectives, including regularization-based data limited image generation,
augmentation-based data limited image generation and knowledge distillation-based
data limited image generation.
In regularization-based data-limited image generation, we mitigate the discriminator
overfitting from the perspective of regularization. We propose a novel Generative
Co-training (GenCo) network that adapts the co-training idea into data-limited
generation for tackling its inherent over-fitting issue. Specifically, we design GenCo,
a Generative Co-training network that mitigates the discriminator over-fitting
issue by introducing multiple complementary discriminators that provide diverse
supervision from multiple distinctive views in training. We instantiate the idea
of GenCo in two ways. The first way is Weight-Discrepancy Co-training (WeCo)
which co-trains multiple distinctive discriminators by diversifying their parameters.
The second way is Data-Discrepancy Co-training (DaCo) which achieves co-training
by feeding discriminators with different views of the input images.
In augmentation-based data-limited image generation, we explore two novel augmentation
based data-limited image generation approaches to achieve better generation
performance. More specifically, we first introduce masked generative adversarial
networks (MaskedGAN), which are robust image generation learners with limited
training data and are masking strategy-based augmentation approach. The idea of MaskedGAN is simple: it randomly masks out certain image information for
effective GAN training with limited data. We develop two masking strategies that
work along orthogonal dimensions of training images, including a shifted spatial
masking that masks the images in spatial dimensions with random shifts, and a
balanced spectral masking that masks certain image spectral bands with self-adaptive
probabilities. The two masking strategies complement each other which together
encourage more challenging holistic learning from limited training data, ultimately
suppressing trivial solutions and failures in GAN training. Secondly, we design LDA,
a Learnable Data Augmentation technique that introduces adversarial attacking for
mitigating the discriminator overfitting in data-efficient I2I translation. The core
idea is adversarial spectrum dropout which decomposes images into multiple spectra
in frequency space and learns to drop certain image spectra for generating effective
adversarial samples. The proposed LDA works in spectral space that allows explicit
access and manipulation of each image spectrum and accordingly enables direct
attack of the easy-to-discriminate image spectra. It evolves dynamically with learnable
parameters which is more scalable and can better mitigate the discriminator
overfitting than hand-crafted and non-learnable augmentation strategies in most
existing studies.
In knowledge distillation-based data-limited image generation, we propose KDDLGAN,
a knowledge-distillation based generation framework that introduces pretrained
vision-language models for training effective data-limited generation models.
KD-DLGAN consists of two innovative designs. The first is aggregated generative
KD that mitigates the discriminator overfitting by challenging the discriminator
with harder learning tasks and distilling more generalizable knowledge from the
pre-trained models. The second is correlated generative KD that improves the
generation diversity by distilling and preserving the diverse image-text correlation
within the pre-trained models.
Experimental results over various data-limited image generation benchmarks indicate
our proposed approaches achieve superior performance with limited training data. |
---|