Effective image synthesis for effective deep neural network training

State-of-the-art deep neural networks (DNNs) necessitate a significant number of images to achieve accurate and robust models. However, gathering a large amount of images remains the prevailing approach, which proves expensive, time-consuming, and challenging to scale across different tasks and d...

Full description

Saved in:
Bibliographic Details
Main Author: Cui, Kaiwen
Other Authors: Lu Shijian
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174934
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:State-of-the-art deep neural networks (DNNs) necessitate a significant number of images to achieve accurate and robust models. However, gathering a large amount of images remains the prevailing approach, which proves expensive, time-consuming, and challenging to scale across different tasks and domains. To address this issue, a solution called data-limited image generation has been proposed. The primary concept behind this approach is to automatically generate valuable and effective images specifically for training purposes. In this thesis, we handle data-limited image generation from three very different perspectives, including regularization-based data limited image generation, augmentation-based data limited image generation and knowledge distillation-based data limited image generation. In regularization-based data-limited image generation, we mitigate the discriminator overfitting from the perspective of regularization. We propose a novel Generative Co-training (GenCo) network that adapts the co-training idea into data-limited generation for tackling its inherent over-fitting issue. Specifically, we design GenCo, a Generative Co-training network that mitigates the discriminator over-fitting issue by introducing multiple complementary discriminators that provide diverse supervision from multiple distinctive views in training. We instantiate the idea of GenCo in two ways. The first way is Weight-Discrepancy Co-training (WeCo) which co-trains multiple distinctive discriminators by diversifying their parameters. The second way is Data-Discrepancy Co-training (DaCo) which achieves co-training by feeding discriminators with different views of the input images. In augmentation-based data-limited image generation, we explore two novel augmentation based data-limited image generation approaches to achieve better generation performance. More specifically, we first introduce masked generative adversarial networks (MaskedGAN), which are robust image generation learners with limited training data and are masking strategy-based augmentation approach. The idea of MaskedGAN is simple: it randomly masks out certain image information for effective GAN training with limited data. We develop two masking strategies that work along orthogonal dimensions of training images, including a shifted spatial masking that masks the images in spatial dimensions with random shifts, and a balanced spectral masking that masks certain image spectral bands with self-adaptive probabilities. The two masking strategies complement each other which together encourage more challenging holistic learning from limited training data, ultimately suppressing trivial solutions and failures in GAN training. Secondly, we design LDA, a Learnable Data Augmentation technique that introduces adversarial attacking for mitigating the discriminator overfitting in data-efficient I2I translation. The core idea is adversarial spectrum dropout which decomposes images into multiple spectra in frequency space and learns to drop certain image spectra for generating effective adversarial samples. The proposed LDA works in spectral space that allows explicit access and manipulation of each image spectrum and accordingly enables direct attack of the easy-to-discriminate image spectra. It evolves dynamically with learnable parameters which is more scalable and can better mitigate the discriminator overfitting than hand-crafted and non-learnable augmentation strategies in most existing studies. In knowledge distillation-based data-limited image generation, we propose KDDLGAN, a knowledge-distillation based generation framework that introduces pretrained vision-language models for training effective data-limited generation models. KD-DLGAN consists of two innovative designs. The first is aggregated generative KD that mitigates the discriminator overfitting by challenging the discriminator with harder learning tasks and distilling more generalizable knowledge from the pre-trained models. The second is correlated generative KD that improves the generation diversity by distilling and preserving the diverse image-text correlation within the pre-trained models. Experimental results over various data-limited image generation benchmarks indicate our proposed approaches achieve superior performance with limited training data.