Deconfounded visual grounding

Deconfounded visual grounding

We focus on the confounding bias between language and location in the visual grounding pipeline, where we find that the bias is the major visual reasoning bottleneck. For example, the grounding process is usually a trivial languagelocation association without visual reasoning, e.g., grounding any la...

Full description

Saved in:

Bibliographic Details
Main Authors:	HUANG, Jianqiang, QIN, Yu, QI, Jiaxin, SUN, Qianru, ZHANG, Hanwang
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Computer Vision (CV) Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/7484 https://ink.library.smu.edu.sg/context/sis_research/article/8487/viewcontent/19983_Article_Text_23996_1_2_20220628.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Visual Commonsense R-CNN
by: WANG, Tan, et al.
Published: (2020)

Open-set domain adaptation by deconfounding domain gaps
by: ZHAO, Xin, et al.
Published: (2023)

VadCLIP: Adapting vision-language models for weakly supervised video anomaly detection
by: WU, Peng, et al.
Published: (2024)

Reducing adaptation latency for multi-concept visual perception in outdoor environments
by: WIGNESS, Maggie, et al.
Published: (2016)

Visual commonsense representation learning via causal inference
by: WANG, Tan, et al.
Published: (2020)

Causal attention for unbiased visual recognition
by: WANG, Tan, et al.
Published: (2021)

Pixel-wise energy-biased abstention learning for anomaly segmentation on complex urban driving scenes
by: TIAN, Yu, et al.
Published: (2022)

Debiasing NLU models via causal intervention and counterfactual reasoning
by: TIAN, Bing, et al.
Published: (2022)

Symmetry robust descriptor for non-rigid surface matching
by: ZHANG, Zhiyuan, et al.
Published: (2013)

Self-supervised multi-class pre-training for unsupervised anomaly detection and segmentation in medical images
by: TIAN, Yu, et al.
Published: (2021)

Feature prediction diffusion model for video anomaly detection
by: YAN, Cheng, et al.
Published: (2023)

How important is the train-validation split in meta-learning?
by: BAI, Yu, et al.
Published: (2021)

Global context aware convolutions for 3D point cloud understanding
by: ZHANG, Zhiyuan, et al.
Published: (2020)

Test-time augmentation for 3D point cloud classification and segmentation
by: VU, Tuan-Anh, et al.
Published: (2024)

Towards improving system performance in large scale multi-agent systems with selfish agents
by: KUMAR, Rajiv Ranjan
Published: (2022)

Knowledge-aware multimodal fashion chatbot
by: LIAO, Lizi, et al.
Published: (2018)

MLP-3D: A MLP-like 3D architecture with grouped time mixing
by: QIU, Zhaofan, et al.
Published: (2022)

Zero-shot ingredient recognition by multi-relational graph convolutional network
by: CHEN, Jingjing, et al.
Published: (2020)

Gesture enhanced comprehension of ambiguous human-to-robot instructions
by: WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON,, et al.
Published: (2020)

Self-trained deep ordinal regression for end-to-end video anomaly detection
by: PANG, Guansong, et al.
Published: (2020)

Edgeduet: Tiling small object detection for edge assisted autonomous mobile vision
by: WANG, Xu, et al.
Published: (2021)

Dynamic temporal filtering in video models
by: LONG, Fuchen, et al.
Published: (2022)

GDFace: Gated deformation for multi-view face image synthesis
by: XU, Xuemiao, et al.
Published: (2020)

Adversarial meta sampling for multilingual low-resource speech recognition
by: XIAO, Yubei, et al.
Published: (2021)

Outlier-robust tensor PCA
by: ZHOU, Pan, et al.
Published: (2016)

Multi-Source Domain Adaptation for Visual Sentiment Classification
by: Chuang Lin, et al.
Published: (2020)

Counterfactual zero-shot and open-set visual recognition
by: YUE, Zhongqi, et al.
Published: (2021)

Learning to hallucinate face images via component generation and enhancement
by: SONG, Yibing, et al.
Published: (2017)

Learning interpretable concept groups in CNNs
by: VARSHNEYA, Saurabh, et al.
Published: (2021)

Self-supervised learning disentangled group representation as feature
by: WANG, Tan, et al.
Published: (2021)

Transporting causal mechanisms for unsupervised domain adaptation
by: YUE, Zhongqi, et al.
Published: (2021)

VENUS: A geometrical representation for quantum state visualization
by: RUAN, Shaolun, et al.
Published: (2023)

Wave-ViT: Unifying wavelet and transformers for visual representation learning
by: YAO, Ting, et al.
Published: (2022)

Few-shot learner parameterization by diffusion time-steps
by: YUE, Zhongqi, et al.
Published: (2024)

Self-regulation for semantic segmentation
by: ZHANG, Dong, et al.
Published: (2021)

Engaging drivers via competition: A case study with arena
by: CHENG, Hao, et al.
Published: (2021)

Fine-grained domain adaptive crowd counting via point-derived segmentation
by: LIU, Yongtuo, et al.
Published: (2023)

Exploring a multimodal fusion-based deep learning network for detecting facial palsy
by: OO, Heng Yim Nicole, et al.
Published: (2024)

Exploring diffusion time-steps for unsupervised representation learning
by: YUE, Zhongqi, et al.
Published: (2024)

Prompting for multimodal hateful meme classification
by: CAO, Rui, et al.
Published: (2022)