Text-to-drawing translation with limited data

Text-to-image translation has seen significant development with the assistance of enormous datasets and novel technologies. OpenAI's CLIP (Contrastive Language-Image Pretraining) is a comprehensive pre-trained neural network that encodes text and image in the same embedding space, thus providin...

全面介紹

Saved in:
書目詳細資料
主要作者: Deng, Ziyang
其他作者: Chen Change Loy
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2023
主題:
在線閱讀:https://hdl.handle.net/10356/166685
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English
id sg-ntu-dr.10356-166685
record_format dspace
spelling sg-ntu-dr.10356-1666852023-05-12T15:36:54Z Text-to-drawing translation with limited data Deng, Ziyang Chen Change Loy School of Computer Science and Engineering ccloy@ntu.edu.sg Engineering::Computer science and engineering Text-to-image translation has seen significant development with the assistance of enormous datasets and novel technologies. OpenAI's CLIP (Contrastive Language-Image Pretraining) is a comprehensive pre-trained neural network that encodes text and image in the same embedding space, thus providing the ability to correlate visual features to semantic words. Popular text-to-image models such as DALL-E 2, VQGAN-CLIP and Stable Diffusion all utilize CLIP's power in some ways. While the market is mainly dominated by autoregressive (AR) and diffusion models, traditional generative adversarial networks (GANs) are capable of producing high-quality images and require much less training data. In this project, with the help of CLIP, we explore the potentials of StyleGAN3 in the context of text-to-image translation, on a custom dataset with 20k text-image pairs. We demonstrate 3 techniques with CLIP: image re-ranking, CLIP loss and CLIP embedding as latent. We experiment with the three settings and find out no positive results in correlations of texts and generated images. We draw the conclusion that despite StyleGAN being powerful on its own, a strong text encoder is equally important to make a good text-to-image AI model. Bachelor of Engineering (Computer Engineering) 2023-05-09T05:12:51Z 2023-05-09T05:12:51Z 2023 Final Year Project (FYP) Deng, Z. (2023). Text-to-drawing translation with limited data. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166685 https://hdl.handle.net/10356/166685 en SCSE22-0309 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Deng, Ziyang
Text-to-drawing translation with limited data
description Text-to-image translation has seen significant development with the assistance of enormous datasets and novel technologies. OpenAI's CLIP (Contrastive Language-Image Pretraining) is a comprehensive pre-trained neural network that encodes text and image in the same embedding space, thus providing the ability to correlate visual features to semantic words. Popular text-to-image models such as DALL-E 2, VQGAN-CLIP and Stable Diffusion all utilize CLIP's power in some ways. While the market is mainly dominated by autoregressive (AR) and diffusion models, traditional generative adversarial networks (GANs) are capable of producing high-quality images and require much less training data. In this project, with the help of CLIP, we explore the potentials of StyleGAN3 in the context of text-to-image translation, on a custom dataset with 20k text-image pairs. We demonstrate 3 techniques with CLIP: image re-ranking, CLIP loss and CLIP embedding as latent. We experiment with the three settings and find out no positive results in correlations of texts and generated images. We draw the conclusion that despite StyleGAN being powerful on its own, a strong text encoder is equally important to make a good text-to-image AI model.
author2 Chen Change Loy
author_facet Chen Change Loy
Deng, Ziyang
format Final Year Project
author Deng, Ziyang
author_sort Deng, Ziyang
title Text-to-drawing translation with limited data
title_short Text-to-drawing translation with limited data
title_full Text-to-drawing translation with limited data
title_fullStr Text-to-drawing translation with limited data
title_full_unstemmed Text-to-drawing translation with limited data
title_sort text-to-drawing translation with limited data
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/166685
_version_ 1770564192188760064