Effective image synthesis for effective deep neural network training

State-of-the-art deep neural networks (DNNs) necessitate a significant number of images to achieve accurate and robust models. However, gathering a large amount of images remains the prevailing approach, which proves expensive, time-consuming, and challenging to scale across different tasks and d...

Full description

Saved in:
Bibliographic Details
Main Author: Cui, Kaiwen
Other Authors: Lu Shijian
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174934
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-174934
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Cui, Kaiwen
Effective image synthesis for effective deep neural network training
description State-of-the-art deep neural networks (DNNs) necessitate a significant number of images to achieve accurate and robust models. However, gathering a large amount of images remains the prevailing approach, which proves expensive, time-consuming, and challenging to scale across different tasks and domains. To address this issue, a solution called data-limited image generation has been proposed. The primary concept behind this approach is to automatically generate valuable and effective images specifically for training purposes. In this thesis, we handle data-limited image generation from three very different perspectives, including regularization-based data limited image generation, augmentation-based data limited image generation and knowledge distillation-based data limited image generation. In regularization-based data-limited image generation, we mitigate the discriminator overfitting from the perspective of regularization. We propose a novel Generative Co-training (GenCo) network that adapts the co-training idea into data-limited generation for tackling its inherent over-fitting issue. Specifically, we design GenCo, a Generative Co-training network that mitigates the discriminator over-fitting issue by introducing multiple complementary discriminators that provide diverse supervision from multiple distinctive views in training. We instantiate the idea of GenCo in two ways. The first way is Weight-Discrepancy Co-training (WeCo) which co-trains multiple distinctive discriminators by diversifying their parameters. The second way is Data-Discrepancy Co-training (DaCo) which achieves co-training by feeding discriminators with different views of the input images. In augmentation-based data-limited image generation, we explore two novel augmentation based data-limited image generation approaches to achieve better generation performance. More specifically, we first introduce masked generative adversarial networks (MaskedGAN), which are robust image generation learners with limited training data and are masking strategy-based augmentation approach. The idea of MaskedGAN is simple: it randomly masks out certain image information for effective GAN training with limited data. We develop two masking strategies that work along orthogonal dimensions of training images, including a shifted spatial masking that masks the images in spatial dimensions with random shifts, and a balanced spectral masking that masks certain image spectral bands with self-adaptive probabilities. The two masking strategies complement each other which together encourage more challenging holistic learning from limited training data, ultimately suppressing trivial solutions and failures in GAN training. Secondly, we design LDA, a Learnable Data Augmentation technique that introduces adversarial attacking for mitigating the discriminator overfitting in data-efficient I2I translation. The core idea is adversarial spectrum dropout which decomposes images into multiple spectra in frequency space and learns to drop certain image spectra for generating effective adversarial samples. The proposed LDA works in spectral space that allows explicit access and manipulation of each image spectrum and accordingly enables direct attack of the easy-to-discriminate image spectra. It evolves dynamically with learnable parameters which is more scalable and can better mitigate the discriminator overfitting than hand-crafted and non-learnable augmentation strategies in most existing studies. In knowledge distillation-based data-limited image generation, we propose KDDLGAN, a knowledge-distillation based generation framework that introduces pretrained vision-language models for training effective data-limited generation models. KD-DLGAN consists of two innovative designs. The first is aggregated generative KD that mitigates the discriminator overfitting by challenging the discriminator with harder learning tasks and distilling more generalizable knowledge from the pre-trained models. The second is correlated generative KD that improves the generation diversity by distilling and preserving the diverse image-text correlation within the pre-trained models. Experimental results over various data-limited image generation benchmarks indicate our proposed approaches achieve superior performance with limited training data.
author2 Lu Shijian
author_facet Lu Shijian
Cui, Kaiwen
format Thesis-Doctor of Philosophy
author Cui, Kaiwen
author_sort Cui, Kaiwen
title Effective image synthesis for effective deep neural network training
title_short Effective image synthesis for effective deep neural network training
title_full Effective image synthesis for effective deep neural network training
title_fullStr Effective image synthesis for effective deep neural network training
title_full_unstemmed Effective image synthesis for effective deep neural network training
title_sort effective image synthesis for effective deep neural network training
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/174934
_version_ 1814047179790090240
spelling sg-ntu-dr.10356-1749342024-08-15T02:28:52Z Effective image synthesis for effective deep neural network training Cui, Kaiwen Lu Shijian School of Computer Science and Engineering Shijian.Lu@ntu.edu.sg Computer and Information Science State-of-the-art deep neural networks (DNNs) necessitate a significant number of images to achieve accurate and robust models. However, gathering a large amount of images remains the prevailing approach, which proves expensive, time-consuming, and challenging to scale across different tasks and domains. To address this issue, a solution called data-limited image generation has been proposed. The primary concept behind this approach is to automatically generate valuable and effective images specifically for training purposes. In this thesis, we handle data-limited image generation from three very different perspectives, including regularization-based data limited image generation, augmentation-based data limited image generation and knowledge distillation-based data limited image generation. In regularization-based data-limited image generation, we mitigate the discriminator overfitting from the perspective of regularization. We propose a novel Generative Co-training (GenCo) network that adapts the co-training idea into data-limited generation for tackling its inherent over-fitting issue. Specifically, we design GenCo, a Generative Co-training network that mitigates the discriminator over-fitting issue by introducing multiple complementary discriminators that provide diverse supervision from multiple distinctive views in training. We instantiate the idea of GenCo in two ways. The first way is Weight-Discrepancy Co-training (WeCo) which co-trains multiple distinctive discriminators by diversifying their parameters. The second way is Data-Discrepancy Co-training (DaCo) which achieves co-training by feeding discriminators with different views of the input images. In augmentation-based data-limited image generation, we explore two novel augmentation based data-limited image generation approaches to achieve better generation performance. More specifically, we first introduce masked generative adversarial networks (MaskedGAN), which are robust image generation learners with limited training data and are masking strategy-based augmentation approach. The idea of MaskedGAN is simple: it randomly masks out certain image information for effective GAN training with limited data. We develop two masking strategies that work along orthogonal dimensions of training images, including a shifted spatial masking that masks the images in spatial dimensions with random shifts, and a balanced spectral masking that masks certain image spectral bands with self-adaptive probabilities. The two masking strategies complement each other which together encourage more challenging holistic learning from limited training data, ultimately suppressing trivial solutions and failures in GAN training. Secondly, we design LDA, a Learnable Data Augmentation technique that introduces adversarial attacking for mitigating the discriminator overfitting in data-efficient I2I translation. The core idea is adversarial spectrum dropout which decomposes images into multiple spectra in frequency space and learns to drop certain image spectra for generating effective adversarial samples. The proposed LDA works in spectral space that allows explicit access and manipulation of each image spectrum and accordingly enables direct attack of the easy-to-discriminate image spectra. It evolves dynamically with learnable parameters which is more scalable and can better mitigate the discriminator overfitting than hand-crafted and non-learnable augmentation strategies in most existing studies. In knowledge distillation-based data-limited image generation, we propose KDDLGAN, a knowledge-distillation based generation framework that introduces pretrained vision-language models for training effective data-limited generation models. KD-DLGAN consists of two innovative designs. The first is aggregated generative KD that mitigates the discriminator overfitting by challenging the discriminator with harder learning tasks and distilling more generalizable knowledge from the pre-trained models. The second is correlated generative KD that improves the generation diversity by distilling and preserving the diverse image-text correlation within the pre-trained models. Experimental results over various data-limited image generation benchmarks indicate our proposed approaches achieve superior performance with limited training data. Doctor of Philosophy 2024-04-17T08:14:10Z 2024-04-17T08:14:10Z 2024 Thesis-Doctor of Philosophy Cui, K. (2024). Effective image synthesis for effective deep neural network training. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/174934 https://hdl.handle.net/10356/174934 10.32657/10356/174934 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University