Cycle-consistent inverse GAN for text-to-image synthesis
This paper investigates an open research task of text-to-image synthesis for automatically generating or manipulating images from text descriptions. Prevailing methods mainly take the textual descriptions as the conditional input for the GAN generation, and need to train different models for the tex...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/156034 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-156034 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1560342022-04-01T06:07:10Z Cycle-consistent inverse GAN for text-to-image synthesis Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan School of Computer Science and Engineering 29th ACM International Conference on Multimedia (MM '21) Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly (LILY) Engineering::Computer science and engineering Text-to-Image Synthesis Cycle consistency This paper investigates an open research task of text-to-image synthesis for automatically generating or manipulating images from text descriptions. Prevailing methods mainly take the textual descriptions as the conditional input for the GAN generation, and need to train different models for the text-guided image generation and manipulation tasks. In this paper, we propose a novel unified framework of Cycle-consistent Inverse GAN (CI-GAN) for both text-to-image generation and text-guided image manipulation tasks. Specifically, we first train a GAN model without text input, aiming to generate images with high diversity and quality. Then we learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image, where we introduce the cycle-consistency training to learn more robust and consistent inverted latent codes. We further uncover the semantics of the latent space of the trained GAN model, by learning a similarity model between text representations and the latent codes. In the text-guided optimization module, we can generate images with the desired semantic attributes through optimization on the inverted latent codes. Extensive experiments on the Recipe1M and CUB datasets validate the efficacy of our proposed framework. AI Singapore Ministry of Education (MOE) Ministry of Health (MOH) National Research Foundation (NRF) Submitted/Accepted version This research is supported, in part, by the National Research Foundation (NRF), Singapore under its AI Singapore Programme (AISG Award No: AISG-GC-2019-003) and under its NRF Investigatorship Programme (NRFI Award No. NRF-NRFI05-2019-0002). This research is also supported, in part, by the Singapore Ministry of Health under its National Innovation Challenge on Active and Confident Ageing (NIC Project No. MOH/NIC/COG04/2017 and MOH/NIC/HAIG03/2017), and the MOE Tier-1 research grants: RG28/18 (S) and RG22/19 (S). 2022-04-01T06:07:10Z 2022-04-01T06:07:10Z 2021 Conference Paper Wang, H., Lin, G., Hoi, S. C. H. & Miao, C. (2021). Cycle-consistent inverse GAN for text-to-image synthesis. 29th ACM International Conference on Multimedia (MM '21), 630-638. https://dx.doi.org/10.1145/3474085.3475226 9781450386517 https://hdl.handle.net/10356/156034 10.1145/3474085.3475226 630 638 en AISG-GC-2019-003 NRF-NRFI05-2019-0002 MOH/NIC/COG04/2017 MOH/NIC/HAIG03/2017 RG28/18 (S) RG22/19 (S) © 2021 Association for Computing Machinery. All rights reserved. This paper was published in Proceedings of the 29th ACM International Conference on Multimedia (MM' 21) and is made available with permission of Association for Computing Machinery. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Text-to-Image Synthesis Cycle consistency |
spellingShingle |
Engineering::Computer science and engineering Text-to-Image Synthesis Cycle consistency Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan Cycle-consistent inverse GAN for text-to-image synthesis |
description |
This paper investigates an open research task of text-to-image synthesis for automatically generating or manipulating images from text descriptions. Prevailing methods mainly take the textual descriptions as the conditional input for the GAN generation, and need to train different models for the text-guided image generation and manipulation tasks. In this paper, we propose a novel unified framework of Cycle-consistent Inverse GAN (CI-GAN) for both text-to-image generation and text-guided image manipulation tasks. Specifically, we first train a GAN model without text input, aiming to generate images with high diversity and quality. Then we learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image, where we introduce the cycle-consistency training to learn more robust and consistent inverted latent codes. We further uncover the semantics of the latent space of the trained GAN model, by learning a similarity model between text representations and the latent codes. In the text-guided optimization module, we can generate images with the desired semantic attributes through optimization on the inverted latent codes. Extensive experiments on the Recipe1M and CUB datasets validate the efficacy of our proposed framework. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan |
format |
Conference or Workshop Item |
author |
Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan |
author_sort |
Wang, Hao |
title |
Cycle-consistent inverse GAN for text-to-image synthesis |
title_short |
Cycle-consistent inverse GAN for text-to-image synthesis |
title_full |
Cycle-consistent inverse GAN for text-to-image synthesis |
title_fullStr |
Cycle-consistent inverse GAN for text-to-image synthesis |
title_full_unstemmed |
Cycle-consistent inverse GAN for text-to-image synthesis |
title_sort |
cycle-consistent inverse gan for text-to-image synthesis |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/156034 |
_version_ |
1729789496009949184 |