CgT-GAN: CLIP-guided text GAN for image captioning

CgT-GAN: CLIP-guided text GAN for image captioning

The large-scale visual-language pre-trained model, Contrastive Language-Image Pre-training (CLIP), has significantly improved image captioning for scenarios without human-annotated image-caption pairs. Recent advanced CLIP-based image captioning without human annotations follows a text-only training...

Full description

Saved in:

Bibliographic Details
Main Authors:	YU, Jiarui, LI, Haoran, HAO, Yanbin, ZHU, Bin, XU, Tong, HE, Xiangnan
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Image captioning CLIP Reinforcement learning GAN Graphics and Human Computer Interfaces Programming Languages and Compilers
Online Access:	https://ink.library.smu.edu.sg/sis_research/9012 https://ink.library.smu.edu.sg/context/sis_research/article/10015/viewcontent/CgT_GAN.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Improving GAN training with probability ratio clipping and sample reweighting
by: WU, Yue, et al.
Published: (2020)

PERSONALIZED VISUAL INFORMATION CAPTIONING
by: WU SHUANG
Published: (2023)

Context-aware visual policy network for fine-grained image captioning
by: Zha, Zheng-Jun, et al.
Published: (2022)

Clip-based similarity measure for hierarchical video retrieval
by: PENG, Yuxin, et al.
Published: (2004)

Approach for video retrieval by video clip
by: PENG, Y., et al.
Published: (2003)

Clip-based similarity measure for query-dependent clip retrieval and video summarization
by: PENG, Yuxin, et al.
Published: (2006)

Position-guided text prompt for vision-language pre-training
by: WANG, Alex Jinpeng, et al.
Published: (2023)

Wasserstein divergence for GANs
by: WU, J., et al.
Published: (2018)

Dynamic captioning: Video accessibility enhancement for hearing impairment
by: Hong, R., et al.
Published: (2013)

Opinion question answering by sentiment clip localization
by: PANG, Lei, et al.
Published: (2016)

Deconfounded image captioning: a causal retrospect
by: Yang, Xu, et al.
Published: (2022)

Vertical GaN-on-GaN Schottky diodes as α-particle radiation sensors
by: Sandupatla, Abhinay, et al.
Published: (2020)

Image captioning via semantic element embedding
by: ZHANG, Xiaodan, et al.
Published: (2020)

Cross-modal graph with meta concepts for video captioning
by: Wang, Hao, et al.
Published: (2022)

A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
by: Liu, A.-A., et al.
Published: (2021)

Neural radiance selector: find the best 2D representations of 3D data for CLIP based 3D tasks
by: Yang, Xiaofeng, et al.
Published: (2024)

Learning to collocate Visual-Linguistic Neural Modules for image captioning
by: Yang, Xu, et al.
Published: (2023)

Nghiên cứu áp dung nội soi tán sỏi xuyên gan qua da với đường hầm vào ống gan chung để điều trị sỏi trong gan
by: Trần, Doanh Hiệu
Published: (2020)

Learning transferable perturbations for image captioning
by: WU, Hanjie, et al.
Published: (2022)

Stack-VS : stacked visual-semantic attention for image caption generation
by: Cheng, Ling, et al.
Published: (2021)

Keyword-driven image captioning via Context-dependent Bilateral LSTM
by: ZHANG, Xiaodan, et al.
Published: (2017)

Exploiting the image prior in CLIP for super-resolution
by: Chen, Xingyu
Published: (2024)

AmpSum: adaptive multiple-product summarization towards improving recommendation captions
by: TRUONG, Quoc Tuan, et al.
Published: (2022)

Interactive change-aware transformer network for remote sensing image change captioning
by: Cai, Chen, et al.
Published: (2024)

RIGID: Recurrent GAN inversion and editing of real face videos
by: XU, Yangyang, et al.
Published: (2023)

Nguy cơ ung thư gan từ món tương
Published: (2017)

Automated localization of affective objects and actions in images via caption text-cum-eye gaze analysis
by: Ramanathan, S., et al.
Published: (2013)

Exploring the video clip presentation as a performance asssessment task in a meaningful engaged learning environment
by: Miranda, Joanne Rieta
Published: (2009)

Investigation of optical properties of nanoporous GaN films
by: Vajpeyi, A.P., et al.
Published: (2014)

Semantic-filtered Soft-Split-Aware video captioning with audio-augmented feature
by: Xu, Yuecong, et al.
Published: (2021)

Nghiên cứu giá trị của GP73 trong chẩn đoán ung thư gan trên bệnh nhân nhiễm virus viêm gan B
by: Trần, Thị Thanh Huyền
Published: (2016)

GaN-based semiconductor saturable absorber mirror operating around 415 nm
by: Xiang, N., et al.
Published: (2014)

More is better : precise and detailed image captioning using online positive recall and missing concepts mining
by: Zhang, Mingxing, et al.
Published: (2020)

Advantages of the Blue InGaN/GaN Light-Emitting Diodes with an AlGaN/GaN/AlGaN Quantum Well Structured Electron Blocking Layer
by: Ju, Zhen Gang, et al.
Published: (2016)

EMD-based video clip retrieval by many-to-many matching
by: PENG, Yuxin, et al.
Published: (2005)

Identification of deep levels in π-GaN epilayers
by: Soh, C.B., et al.
Published: (2014)

Fabrication and characterization of AIGaN/GaN HEMTs
by: HOY KIN MENG, DERRICK
Published: (2010)

Cascade EF-GAN : progressive facial expression editing with local focuses
by: Wu, Rongliang, et al.
Published: (2021)

Video clip retrieval by maximal matching and optimal matching in graph theory
by: PENG, Yu-Xin, et al.
Published: (2003)

Thay đổi số lượng tiểu cầu trên bệnh nhân xơ gan
by: Phạm, Văn Hoàng
Published: (2020)