Learning language to symbol and language to vision mapping for visual grounding

Learning language to symbol and language to vision mapping for visual grounding

Visual Grounding (VG) is a task of locating a specific object in an image semantically matching a given linguistic expression. The mapping of the linguistic and visual contents and the understanding of diverse linguistic expressions are the two challenges of this task. The performance of visual grou...

Full description

Saved in:

Bibliographic Details
Main Authors:	He, Su, Yang, Xiaofeng, Lin, Guosheng
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2022
Subjects:	Engineering::Computer science and engineering Cross Modality Visual Grounding
Online Access:	https://hdl.handle.net/10356/161552
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Similar Items

Is a high tone pointy? Speakers of different languages match Mandarin Chinese tones to visual shapes differently
by: Shang, Nan, et al.
Published: (2018)

Enhancing visual grounding in vision-language pre-training with position-guided text prompts
by: WANG, Alex Jinpeng, et al.
Published: (2024)

Neural logic vision language explainer
by: Yang, Xiaofeng, et al.
Published: (2023)

Vision language representation learning
by: Yang, Xiaofeng
Published: (2023)

Demo abstract: VGGlass - Demonstrating visual grounding and localization synergy with a LiDAR-enabled smart-glass
by: RATHNAYAKE, Darshana, et al.
Published: (2023)

THE ASSOCIATION BETWEEN EMOTION AND VISION
by: FENG YENJU
Published: (2020)

Learning a cross-modal hashing network for multimedia search
by: Tan, Yap Peng, et al.
Published: (2018)

Comparing Musicians and Non-musicians’ Expectations in Music and Vision
by: Kathleen Rose Agres, et al.
Published: (2024)

Implicit Association Test (IAT) studies investigating pitch-shape audiovisual cross-modal associations across language groups
by: Shang, Nan, et al.
Published: (2023)

GROUND-TO-SATELLITE IMAGE-BASED GEO-LOCALIZATION
by: HU SIXING
Published: (2020)

Temporal sentence grounding in videos: a survey and future directions
by: Zhang, Hao, et al.
Published: (2023)

Learning to compose and reason with language tree structures for visual grounding
by: Hong, Richang, et al.
Published: (2022)

On true language understanding
by: HO, Seng-Beng, et al.
Published: (2019)

Grounding referring expression in computer vision
by: Yuen, Shaun Chien Wee
Published: (2024)

Alleviating the inconsistency of multimodal data in cross-modal retrieval
by: Li, Tieying, et al.
Published: (2024)

TRRNet : tiered relation reasoning for compositional visual question answering
by: Yang, Xiaofeng, et al.
Published: (2020)

CROSS-MODALITY COMPLEMENTARITY FOR AUDIO-VISUAL SPEECH RECOGNITION
by: WANG JIADONG
Published: (2024)

Collaborative cross-modal fusion with Large Language Model for recommendation
by: LIU, Zhongzhou, et al.
Published: (2024)

AUDIO-VISUAL ACTIVE SPEAKER DETECTION AND RECOGNITION
by: TAO RUIJIE
Published: (2023)

Measurement of different ground effect aircraft designs
by: Phan, Hector Jun Wen
Published: (2024)

Language and robotics: Complex sentence understanding
by: HO, Seng-Beng, et al.
Published: (2019)

Demonstrating multi-modal human instruction comprehension with AR smart glass
by: WEERAKOON, Mudiyanselage Dulanga Kaveesha, et al.
Published: (2023)

Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection
by: Gao, Wei, et al.
Published: (2021)

Language-guided visual retrieval
by: He, Su
Published: (2021)

Systematic Mapping on Adsorption Studies Using Spent Coffee Grounds
by: Mangussad, Therese Jan E., et al.
Published: (2021)

Ground effect on flow past a wing with a NACA0015 cross-section
by: Luo, S.C., et al.
Published: (2014)

FedVision : an online visual object detection platform powered by federated learning
by: Liu, Yang, et al.
Published: (2020)

FHENet: lightweight feature hierarchical exploration network for real-time rail surface defect inspection in RGB-D images
by: Zhou, Wujie, et al.
Published: (2023)

Grounding referring expressions in images by variational context
by: Zhang, Hanwang, et al.
Published: (2020)

When does maluma/takete fail? two key failures and a meta-analysis suggest that phonology and phonotactics matter
by: Styles, Suzy J., et al.
Published: (2019)

Enabling and optimizing multi-modal sense-making for human-AI interaction tasks
by: WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha Weerakoon
Published: (2024)

Traversability analysis for UGV (unmanned ground vehicle) navigation based on multimodal information
by: Guo, Jiajie
Published: (2024)

汉语情态助动词的主观性和主观化 = THE SUBJECTIVITY AND SUBJECTIFICATION OF MODAL AUXILIARIES IN CHINESE
by: 杨黎黎, et al.
Published: (2015)

Attention mechanism optimization for sub-symbolic-based and neural-symbolic-based natural language processing
by: Ni, Jinjie
Published: (2023)

Satellite ground station software architecture using CORBA and JAVA Language Mapping
by: Xu, Jinsong.
Published: (2008)

A data-driven control method for ground locomotion on sloped terrain of a hybrid aerial-ground robot
by: Xu, Xinhang, et al.
Published: (2025)

SAR Ground Moving Target Imaging Algorithm Based on Parametric and Dynamic Sparse Bayesian Learning
by: Yang, Lei, et al.
Published: (2017)

Getai kitsch : reinterpretation of Getai's aesthetics as a visual language for contemporary culture
by: Tan, Syivester
Published: (2020)

Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval
by: Jing-Jing Chen, et al.
Published: (2020)

Crew: Cross-modal resource searching by exploiting wikipedia
by: Liu, C., et al.
Published: (2013)