Text-to-drawing translation with limited data

Text-to-image translation has seen significant development with the assistance of enormous datasets and novel technologies. OpenAI's CLIP (Contrastive Language-Image Pretraining) is a comprehensive pre-trained neural network that encodes text and image in the same embedding space, thus providin...

全面介紹

Saved in:

書目詳細資料
主要作者:	Deng, Ziyang
其他作者:	Chen Change Loy
格式:	Final Year Project
語言:	English
出版:	Nanyang Technological University 2023
主題:	Engineering::Computer science and engineering
在線閱讀:	https://hdl.handle.net/10356/166685
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

id	sg-ntu-dr.10356-166685
record_format	dspace
spelling	sg-ntu-dr.10356-1666852023-05-12T15:36:54Z Text-to-drawing translation with limited data Deng, Ziyang Chen Change Loy School of Computer Science and Engineering ccloy@ntu.edu.sg Engineering::Computer science and engineering Text-to-image translation has seen significant development with the assistance of enormous datasets and novel technologies. OpenAI's CLIP (Contrastive Language-Image Pretraining) is a comprehensive pre-trained neural network that encodes text and image in the same embedding space, thus providing the ability to correlate visual features to semantic words. Popular text-to-image models such as DALL-E 2, VQGAN-CLIP and Stable Diffusion all utilize CLIP's power in some ways. While the market is mainly dominated by autoregressive (AR) and diffusion models, traditional generative adversarial networks (GANs) are capable of producing high-quality images and require much less training data. In this project, with the help of CLIP, we explore the potentials of StyleGAN3 in the context of text-to-image translation, on a custom dataset with 20k text-image pairs. We demonstrate 3 techniques with CLIP: image re-ranking, CLIP loss and CLIP embedding as latent. We experiment with the three settings and find out no positive results in correlations of texts and generated images. We draw the conclusion that despite StyleGAN being powerful on its own, a strong text encoder is equally important to make a good text-to-image AI model. Bachelor of Engineering (Computer Engineering) 2023-05-09T05:12:51Z 2023-05-09T05:12:51Z 2023 Final Year Project (FYP) Deng, Z. (2023). Text-to-drawing translation with limited data. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166685 https://hdl.handle.net/10356/166685 en SCSE22-0309 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Deng, Ziyang Text-to-drawing translation with limited data
description	Text-to-image translation has seen significant development with the assistance of enormous datasets and novel technologies. OpenAI's CLIP (Contrastive Language-Image Pretraining) is a comprehensive pre-trained neural network that encodes text and image in the same embedding space, thus providing the ability to correlate visual features to semantic words. Popular text-to-image models such as DALL-E 2, VQGAN-CLIP and Stable Diffusion all utilize CLIP's power in some ways. While the market is mainly dominated by autoregressive (AR) and diffusion models, traditional generative adversarial networks (GANs) are capable of producing high-quality images and require much less training data. In this project, with the help of CLIP, we explore the potentials of StyleGAN3 in the context of text-to-image translation, on a custom dataset with 20k text-image pairs. We demonstrate 3 techniques with CLIP: image re-ranking, CLIP loss and CLIP embedding as latent. We experiment with the three settings and find out no positive results in correlations of texts and generated images. We draw the conclusion that despite StyleGAN being powerful on its own, a strong text encoder is equally important to make a good text-to-image AI model.
author2	Chen Change Loy
author_facet	Chen Change Loy Deng, Ziyang
format	Final Year Project
author	Deng, Ziyang
author_sort	Deng, Ziyang
title	Text-to-drawing translation with limited data
title_short	Text-to-drawing translation with limited data
title_full	Text-to-drawing translation with limited data
title_fullStr	Text-to-drawing translation with limited data
title_full_unstemmed	Text-to-drawing translation with limited data
title_sort	text-to-drawing translation with limited data
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/166685
_version_	1770564192188760064

Text-to-drawing translation with limited data

相似書籍