Text-to-image generation based on generative adversarial network
Generating images from text descriptions is an intersecting field between natural language processing and computer vision. This task generates an image that conforms to an input text description in semantic details. Generative Adversarial Networks (GANs), as one of the most popular generation method...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/164293 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-164293 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1642932023-01-16T03:42:49Z Text-to-image generation based on generative adversarial network Guo, Xiangzuo Jiang Xudong School of Electrical and Electronic Engineering EXDJiang@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Generating images from text descriptions is an intersecting field between natural language processing and computer vision. This task generates an image that conforms to an input text description in semantic details. Generative Adversarial Networks (GANs), as one of the most popular generation methods, are an important solution for generating images from text. Text-to-image methods based on GANs have developed rapidly in recent years. They generally use a Conditional GAN (CGAN) architecture, adding textual descriptions as extra features to the network to generate pictures under text constraints. For our study, we choose the widely used Stacked GAN (StackGAN) model as the baseline. We propose the following improvements on the model structure and training procedure, as our novel contribution to this research area. On the model structure, the StackGAN consists of two generators. we improve the textual conditioning by adding the textual embeddings to each upsampling block as a multi-level input. In this manner, image generators will receive richer semantic information, improving image-text consistency. We also add the Non-local blocks within the two generators to make the network better integrate global information. On the training process, we propose to use the Wasserstein GAN method to train the network to alleviate the difficulty of training the original GAN. In particular, we use the Wasserstein distance as a guidance signal for the GAN training. It also alleviates the common issue of mode collapse. To evaluate the impact of our modifications to the original StackGAN, we conduct ablation experiments on the CUB-200-2011 bird dataset. The experimental results show that the revised network has achieved better experimental results than the original network. Master of Science (Signal Processing) 2023-01-16T03:42:49Z 2023-01-16T03:42:49Z 2022 Thesis-Master by Coursework Guo, X. (2022). Text-to-image generation based on generative adversarial network. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164293 https://hdl.handle.net/10356/164293 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Guo, Xiangzuo Text-to-image generation based on generative adversarial network |
description |
Generating images from text descriptions is an intersecting field between natural language processing and computer vision. This task generates an image that conforms to an input text description in semantic details. Generative Adversarial Networks (GANs), as one of the most popular generation methods, are an important solution for generating images from text.
Text-to-image methods based on GANs have developed rapidly in recent years. They generally use a Conditional GAN (CGAN) architecture, adding textual descriptions as extra features to the network to generate pictures under text constraints. For our study, we choose the widely used Stacked GAN (StackGAN) model as the baseline. We propose the following improvements on the model structure and training procedure, as our novel contribution to this research area.
On the model structure, the StackGAN consists of two generators. we improve the textual conditioning by adding the textual embeddings to each upsampling block as a multi-level input. In this manner, image generators will receive richer semantic information, improving image-text consistency. We also add the Non-local blocks within the two generators to make the network better integrate global information. On the training process, we propose to use the Wasserstein GAN method to train the network to alleviate the difficulty of training the original GAN. In particular, we use the Wasserstein distance as a guidance signal for the GAN training. It also alleviates the common issue of mode collapse.
To evaluate the impact of our modifications to the original StackGAN, we conduct ablation experiments on the CUB-200-2011 bird dataset. The experimental results show that the revised network has achieved better experimental results than the original network. |
author2 |
Jiang Xudong |
author_facet |
Jiang Xudong Guo, Xiangzuo |
format |
Thesis-Master by Coursework |
author |
Guo, Xiangzuo |
author_sort |
Guo, Xiangzuo |
title |
Text-to-image generation based on generative adversarial network |
title_short |
Text-to-image generation based on generative adversarial network |
title_full |
Text-to-image generation based on generative adversarial network |
title_fullStr |
Text-to-image generation based on generative adversarial network |
title_full_unstemmed |
Text-to-image generation based on generative adversarial network |
title_sort |
text-to-image generation based on generative adversarial network |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/164293 |
_version_ |
1756370560961478656 |