Grounding referring expressions in images by variational context
We focus on grounding (i.e., localizing or linking) referring expressions in images, e.g., 'largest elephant standing behind baby elephant'. This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehens...
Saved in:
Main Authors: | Zhang, Hanwang, Niu, Yulei, Chang, Shih-Fu |
---|---|
Other Authors: | School of Computer Science and Engineering |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/143054 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
Grounding referring expression in computer vision
by: Yuen, Shaun Chien Wee
Published: (2024) -
Experimental and numerical investigation of ground heat exchangers in the building foundation
by: Kayaci, Nurullah, et al.
Published: (2021) -
A service-oriented middleware for building context-aware services
by: Gu, T., et al.
Published: (2013) -
Grounding referring expressions in images with neural module tree network
by: Tan, Kuan Yeow
Published: (2022) -
Measurement of different ground effect aircraft designs
by: Phan, Hector Jun Wen
Published: (2024)