Text-to-image generation based on generative adversarial network

Generating images from text descriptions is an intersecting field between natural language processing and computer vision. This task generates an image that conforms to an input text description in semantic details. Generative Adversarial Networks (GANs), as one of the most popular generation method...

Full description

Saved in:
Bibliographic Details
Main Author: Guo, Xiangzuo
Other Authors: Jiang Xudong
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164293
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-164293
record_format dspace
spelling sg-ntu-dr.10356-1642932023-01-16T03:42:49Z Text-to-image generation based on generative adversarial network Guo, Xiangzuo Jiang Xudong School of Electrical and Electronic Engineering EXDJiang@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Generating images from text descriptions is an intersecting field between natural language processing and computer vision. This task generates an image that conforms to an input text description in semantic details. Generative Adversarial Networks (GANs), as one of the most popular generation methods, are an important solution for generating images from text. Text-to-image methods based on GANs have developed rapidly in recent years. They generally use a Conditional GAN (CGAN) architecture, adding textual descriptions as extra features to the network to generate pictures under text constraints. For our study, we choose the widely used Stacked GAN (StackGAN) model as the baseline. We propose the following improvements on the model structure and training procedure, as our novel contribution to this research area. On the model structure, the StackGAN consists of two generators. we improve the textual conditioning by adding the textual embeddings to each upsampling block as a multi-level input. In this manner, image generators will receive richer semantic information, improving image-text consistency. We also add the Non-local blocks within the two generators to make the network better integrate global information. On the training process, we propose to use the Wasserstein GAN method to train the network to alleviate the difficulty of training the original GAN. In particular, we use the Wasserstein distance as a guidance signal for the GAN training. It also alleviates the common issue of mode collapse. To evaluate the impact of our modifications to the original StackGAN, we conduct ablation experiments on the CUB-200-2011 bird dataset. The experimental results show that the revised network has achieved better experimental results than the original network. Master of Science (Signal Processing) 2023-01-16T03:42:49Z 2023-01-16T03:42:49Z 2022 Thesis-Master by Coursework Guo, X. (2022). Text-to-image generation based on generative adversarial network. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164293 https://hdl.handle.net/10356/164293 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Guo, Xiangzuo
Text-to-image generation based on generative adversarial network
description Generating images from text descriptions is an intersecting field between natural language processing and computer vision. This task generates an image that conforms to an input text description in semantic details. Generative Adversarial Networks (GANs), as one of the most popular generation methods, are an important solution for generating images from text. Text-to-image methods based on GANs have developed rapidly in recent years. They generally use a Conditional GAN (CGAN) architecture, adding textual descriptions as extra features to the network to generate pictures under text constraints. For our study, we choose the widely used Stacked GAN (StackGAN) model as the baseline. We propose the following improvements on the model structure and training procedure, as our novel contribution to this research area. On the model structure, the StackGAN consists of two generators. we improve the textual conditioning by adding the textual embeddings to each upsampling block as a multi-level input. In this manner, image generators will receive richer semantic information, improving image-text consistency. We also add the Non-local blocks within the two generators to make the network better integrate global information. On the training process, we propose to use the Wasserstein GAN method to train the network to alleviate the difficulty of training the original GAN. In particular, we use the Wasserstein distance as a guidance signal for the GAN training. It also alleviates the common issue of mode collapse. To evaluate the impact of our modifications to the original StackGAN, we conduct ablation experiments on the CUB-200-2011 bird dataset. The experimental results show that the revised network has achieved better experimental results than the original network.
author2 Jiang Xudong
author_facet Jiang Xudong
Guo, Xiangzuo
format Thesis-Master by Coursework
author Guo, Xiangzuo
author_sort Guo, Xiangzuo
title Text-to-image generation based on generative adversarial network
title_short Text-to-image generation based on generative adversarial network
title_full Text-to-image generation based on generative adversarial network
title_fullStr Text-to-image generation based on generative adversarial network
title_full_unstemmed Text-to-image generation based on generative adversarial network
title_sort text-to-image generation based on generative adversarial network
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/164293
_version_ 1756370560961478656