Cycle-consistent inverse GAN for text-to-image synthesis

This paper investigates an open research task of text-to-image synthesis for automatically generating or manipulating images from text descriptions. Prevailing methods mainly take the textual descriptions as the conditional input for the GAN generation, and need to train different models for the tex...

Full description

Saved in:
Bibliographic Details
Main Authors: Wang, Hao, Lin, Guosheng, Hoi, Steven C. H., Miao, Chunyan
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156034
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-156034
record_format dspace
spelling sg-ntu-dr.10356-1560342022-04-01T06:07:10Z Cycle-consistent inverse GAN for text-to-image synthesis Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan School of Computer Science and Engineering 29th ACM International Conference on Multimedia (MM '21) Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly (LILY) Engineering::Computer science and engineering Text-to-Image Synthesis Cycle consistency This paper investigates an open research task of text-to-image synthesis for automatically generating or manipulating images from text descriptions. Prevailing methods mainly take the textual descriptions as the conditional input for the GAN generation, and need to train different models for the text-guided image generation and manipulation tasks. In this paper, we propose a novel unified framework of Cycle-consistent Inverse GAN (CI-GAN) for both text-to-image generation and text-guided image manipulation tasks. Specifically, we first train a GAN model without text input, aiming to generate images with high diversity and quality. Then we learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image, where we introduce the cycle-consistency training to learn more robust and consistent inverted latent codes. We further uncover the semantics of the latent space of the trained GAN model, by learning a similarity model between text representations and the latent codes. In the text-guided optimization module, we can generate images with the desired semantic attributes through optimization on the inverted latent codes. Extensive experiments on the Recipe1M and CUB datasets validate the efficacy of our proposed framework. AI Singapore Ministry of Education (MOE) Ministry of Health (MOH) National Research Foundation (NRF) Submitted/Accepted version This research is supported, in part, by the National Research Foundation (NRF), Singapore under its AI Singapore Programme (AISG Award No: AISG-GC-2019-003) and under its NRF Investigatorship Programme (NRFI Award No. NRF-NRFI05-2019-0002). This research is also supported, in part, by the Singapore Ministry of Health under its National Innovation Challenge on Active and Confident Ageing (NIC Project No. MOH/NIC/COG04/2017 and MOH/NIC/HAIG03/2017), and the MOE Tier-1 research grants: RG28/18 (S) and RG22/19 (S). 2022-04-01T06:07:10Z 2022-04-01T06:07:10Z 2021 Conference Paper Wang, H., Lin, G., Hoi, S. C. H. & Miao, C. (2021). Cycle-consistent inverse GAN for text-to-image synthesis. 29th ACM International Conference on Multimedia (MM '21), 630-638. https://dx.doi.org/10.1145/3474085.3475226 9781450386517 https://hdl.handle.net/10356/156034 10.1145/3474085.3475226 630 638 en AISG-GC-2019-003 NRF-NRFI05-2019-0002 MOH/NIC/COG04/2017 MOH/NIC/HAIG03/2017 RG28/18 (S) RG22/19 (S) © 2021 Association for Computing Machinery. All rights reserved. This paper was published in Proceedings of the 29th ACM International Conference on Multimedia (MM' 21) and is made available with permission of Association for Computing Machinery. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Text-to-Image Synthesis
Cycle consistency
spellingShingle Engineering::Computer science and engineering
Text-to-Image Synthesis
Cycle consistency
Wang, Hao
Lin, Guosheng
Hoi, Steven C. H.
Miao, Chunyan
Cycle-consistent inverse GAN for text-to-image synthesis
description This paper investigates an open research task of text-to-image synthesis for automatically generating or manipulating images from text descriptions. Prevailing methods mainly take the textual descriptions as the conditional input for the GAN generation, and need to train different models for the text-guided image generation and manipulation tasks. In this paper, we propose a novel unified framework of Cycle-consistent Inverse GAN (CI-GAN) for both text-to-image generation and text-guided image manipulation tasks. Specifically, we first train a GAN model without text input, aiming to generate images with high diversity and quality. Then we learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image, where we introduce the cycle-consistency training to learn more robust and consistent inverted latent codes. We further uncover the semantics of the latent space of the trained GAN model, by learning a similarity model between text representations and the latent codes. In the text-guided optimization module, we can generate images with the desired semantic attributes through optimization on the inverted latent codes. Extensive experiments on the Recipe1M and CUB datasets validate the efficacy of our proposed framework.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Wang, Hao
Lin, Guosheng
Hoi, Steven C. H.
Miao, Chunyan
format Conference or Workshop Item
author Wang, Hao
Lin, Guosheng
Hoi, Steven C. H.
Miao, Chunyan
author_sort Wang, Hao
title Cycle-consistent inverse GAN for text-to-image synthesis
title_short Cycle-consistent inverse GAN for text-to-image synthesis
title_full Cycle-consistent inverse GAN for text-to-image synthesis
title_fullStr Cycle-consistent inverse GAN for text-to-image synthesis
title_full_unstemmed Cycle-consistent inverse GAN for text-to-image synthesis
title_sort cycle-consistent inverse gan for text-to-image synthesis
publishDate 2022
url https://hdl.handle.net/10356/156034
_version_ 1729789496009949184