Text-to-image generation based on generative adversarial network

Generating images from text descriptions is an intersecting field between natural language processing and computer vision. This task generates an image that conforms to an input text description in semantic details. Generative Adversarial Networks (GANs), as one of the most popular generation method...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Guo, Xiangzuo
مؤلفون آخرون:	Jiang Xudong
التنسيق:	Thesis-Master by Coursework
اللغة:	English
منشور في:	Nanyang Technological University 2023
الموضوعات:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/164293
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-164293
record_format	dspace
spelling	sg-ntu-dr.10356-1642932023-01-16T03:42:49Z Text-to-image generation based on generative adversarial network Guo, Xiangzuo Jiang Xudong School of Electrical and Electronic Engineering EXDJiang@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Generating images from text descriptions is an intersecting field between natural language processing and computer vision. This task generates an image that conforms to an input text description in semantic details. Generative Adversarial Networks (GANs), as one of the most popular generation methods, are an important solution for generating images from text. Text-to-image methods based on GANs have developed rapidly in recent years. They generally use a Conditional GAN (CGAN) architecture, adding textual descriptions as extra features to the network to generate pictures under text constraints. For our study, we choose the widely used Stacked GAN (StackGAN) model as the baseline. We propose the following improvements on the model structure and training procedure, as our novel contribution to this research area. On the model structure, the StackGAN consists of two generators. we improve the textual conditioning by adding the textual embeddings to each upsampling block as a multi-level input. In this manner, image generators will receive richer semantic information, improving image-text consistency. We also add the Non-local blocks within the two generators to make the network better integrate global information. On the training process, we propose to use the Wasserstein GAN method to train the network to alleviate the difficulty of training the original GAN. In particular, we use the Wasserstein distance as a guidance signal for the GAN training. It also alleviates the common issue of mode collapse. To evaluate the impact of our modifications to the original StackGAN, we conduct ablation experiments on the CUB-200-2011 bird dataset. The experimental results show that the revised network has achieved better experimental results than the original network. Master of Science (Signal Processing) 2023-01-16T03:42:49Z 2023-01-16T03:42:49Z 2022 Thesis-Master by Coursework Guo, X. (2022). Text-to-image generation based on generative adversarial network. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164293 https://hdl.handle.net/10356/164293 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Guo, Xiangzuo Text-to-image generation based on generative adversarial network
description	Generating images from text descriptions is an intersecting field between natural language processing and computer vision. This task generates an image that conforms to an input text description in semantic details. Generative Adversarial Networks (GANs), as one of the most popular generation methods, are an important solution for generating images from text. Text-to-image methods based on GANs have developed rapidly in recent years. They generally use a Conditional GAN (CGAN) architecture, adding textual descriptions as extra features to the network to generate pictures under text constraints. For our study, we choose the widely used Stacked GAN (StackGAN) model as the baseline. We propose the following improvements on the model structure and training procedure, as our novel contribution to this research area. On the model structure, the StackGAN consists of two generators. we improve the textual conditioning by adding the textual embeddings to each upsampling block as a multi-level input. In this manner, image generators will receive richer semantic information, improving image-text consistency. We also add the Non-local blocks within the two generators to make the network better integrate global information. On the training process, we propose to use the Wasserstein GAN method to train the network to alleviate the difficulty of training the original GAN. In particular, we use the Wasserstein distance as a guidance signal for the GAN training. It also alleviates the common issue of mode collapse. To evaluate the impact of our modifications to the original StackGAN, we conduct ablation experiments on the CUB-200-2011 bird dataset. The experimental results show that the revised network has achieved better experimental results than the original network.
author2	Jiang Xudong
author_facet	Jiang Xudong Guo, Xiangzuo
format	Thesis-Master by Coursework
author	Guo, Xiangzuo
author_sort	Guo, Xiangzuo
title	Text-to-image generation based on generative adversarial network
title_short	Text-to-image generation based on generative adversarial network
title_full	Text-to-image generation based on generative adversarial network
title_fullStr	Text-to-image generation based on generative adversarial network
title_full_unstemmed	Text-to-image generation based on generative adversarial network
title_sort	text-to-image generation based on generative adversarial network
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/164293
_version_	1756370560961478656

Text-to-image generation based on generative adversarial network

مواد مشابهة