CgT-GAN: CLIP-guided text GAN for image captioning

The large-scale visual-language pre-trained model, Contrastive Language-Image Pre-training (CLIP), has significantly improved image captioning for scenarios without human-annotated image-caption pairs. Recent advanced CLIP-based image captioning without human annotations follows a text-only training...

Full description

Saved in:
Bibliographic Details
Main Authors: YU, Jiarui, LI, Haoran, HAO, Yanbin, ZHU, Bin, XU, Tong, HE, Xiangnan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
GAN
Online Access:https://ink.library.smu.edu.sg/sis_research/9012
https://ink.library.smu.edu.sg/context/sis_research/article/10015/viewcontent/CgT_GAN.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English