Enhancing visual grounding in vision-language pre-training with position-guided text prompts

Enhancing visual grounding in vision-language pre-training with position-guided text prompts

Vision-Language Pre-Training (VLP) has demonstrated remarkable potential in aligning image and text pairs, paving the way for a wide range of cross-modal learning tasks. Nevertheless, we have observed that VLP models often fall short in terms of visual grounding and localization capabilities, which...

Full description

Saved in:

Bibliographic Details
Main Authors:	WANG, Alex Jinpeng, ZHOU, Pan, SHOU, Mike Zheng, YAN, Shuicheng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Fill-in-the-blank position-guided text prompt vision-language pre-training visual grounding Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing Programming Languages and Compilers
Online Access:	https://ink.library.smu.edu.sg/sis_research/8742 https://ink.library.smu.edu.sg/context/sis_research/article/9745/viewcontent/VisualGroundingVL_av.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Position-guided text prompt for vision-language pre-training
by: WANG, Alex Jinpeng, et al.
Published: (2023)

Augmenting low-resource text classification with graph-grounded pre-training and prompting
by: WEN, Zhihao, et al.
Published: (2023)

Prompt tuning on Graph-Augmented Low-Resource text classification
by: WEN, Zhihao, et al.
Published: (2024)

Voucher abuse detection with prompt-based fine-tuning on graph neural networks
by: WEN, Zhihao, et al.
Published: (2023)

ClusterPrompt: Cluster semantic enhanced prompt learning for new intent discovery
by: LIANG, Jinggui, et al.
Published: (2023)

S-prompts learning with pre-trained transformers: An Occam's razor for domain incremental learning
by: WANG, Yabin, et al.
Published: (2022)

Attack prompt generation for red teaming and defending large language models
by: DENG, Boyi, et al.
Published: (2023)

Using pre-trained models for vision-language understanding tasks
by: CAO, Rui
Published: (2024)

Injecting descriptive meta-information into pre-trained language models with hypernetworks
by: DUAN, Wenying, et al.
Published: (2021)

Screening through a broad pool: Towards better diversity for lexically constrained text generation
by: YUAN, Changsen, et al.
Published: (2024)

Graphprompt: Unifying pre-training and downstream tasks for graph neural networks
by: LIU, Zemin, et al.
Published: (2023)

Text-attributed graph representation learning : Methods, applications, and challenges
by: ZHANG, Ce, et al.
Published: (2024)

Collective prompt tuning with relation inference for document-level relation extraction
by: YUAN, Changsen, et al.
Published: (2023)

Towards LLM-based fact verification on news claims with a hierarchical step-by-step prompting method
by: ZHANG, Xuan, et al.
Published: (2023)

Generalized graph prompt: Toward a unification of pre-training and downstream tasks on graphs
by: YU, Xingtong, et al.
Published: (2024)

Compositional prompting video-language models to understand procedure in instructional videos
by: Hu, Guyue, et al.
Published: (2023)

Machine-learning approach to automated doubt identification on stack overflow comments to guide programming learners
by: CHEN, Tianhao, et al.
Published: (2023)

On the transferability of pre-trained language models for low-resource programming languages
by: CHEN, Fuxiang, et al.
Published: (2022)

On true language understanding
by: HO, Seng-Beng, et al.
Published: (2019)

How people prompt generative AI to create interactive VR scenes
by: AGHEL MANESH, Setareh, et al.
Published: (2024)

THE INFLUENCE OF DIFFERENT MODALITIES ON THE DESIGN OF PROMPTS IN A LOWER LIMB EXERGAME FOR COGNITIVELY IMPAIRED OLDER ADULTS
by: LIOW WEI TING
Published: (2021)

Annotating videos that teach MS Excel and predicting mouse / keyboard actions
by: Tan, Genson Yao Jie
Published: (2024)

A prompt-based topic-modeling method for depression detection on low-resource data
by: GUO, Yanrong, et al.
Published: (2024)

Large language models as source planner for personalized knowledge-grounded dialogues
by: WANG, Hongru, et al.
Published: (2023)

The use of topic representativewords in text categorization
by: Kim, S.N., et al.
Published: (2013)

VLStereoSet: A study of stereotypical bias in pre-trained vision-language models
by: ZHOU, Kankan, et al.
Published: (2022)

Effects of bin proximity and informational prompts on recycling and contamination
by: ROSENTHAL, Sonny, et al.
Published: (2021)

Text analytics, NLP, and accounting research
by: CROWLEY, Richard M.
Published: (2020)

What was written vs. Who read it: News media profiling using text analysis and social media context
by: BALY, Ramy, et al.
Published: (2020)

Improving conversational recommender system via contextual and time-aware modeling with less domain-specific knowledge
by: WANG, Lingzhi, et al.
Published: (2024)

Laughter emotion recognition using gestures
by: De Jesus, Paulina Catya S.
Published: (2014)

Sound and complete certificates for quantitative termination analysis of probabilistic programs
by: CHATTERJEE, Krishnendu, et al.
Published: (2022)

Learning control policies for stochastic systems with reach-avoid guarantees
by: ZIKELIC, Dorde, et al.
Published: (2023)

A Prolog-based definition of an entity-relationship language
by: CHAN, H., et al.
Published: (1993)

Revisiting masked auto-encoders for ECG-language representation learning
by: PHAM, Hung Manh, et al.
Published: (2024)

A case study on automated fuzz target generation for large codebases
by: KELLY, Matthew, et al.
Published: (2019)

Prompt sensitivity of transformer variants for text classification
by: Ong, Li Han
Published: (2024)

Weakly supervised video anomaly detection and localization with spatio-temporal prompts
by: WU, Peng, et al.
Published: (2026)

Program evaluation for Easy, C, Pascal programming languages
by: Garcia, Arnaldo, et al.
Published: (1989)

Delineating the Construct Network of the Personnel Reaction Blank: Associations With Externalizing Tendencies and Normal Personality
by: Blonigen, D.M., et al.
Published: (2013)