Visual Commonsense R-CNN

Visual Commonsense R-CNN

We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA. Given a set of detected object regions in an image (e.g., us...

Full description

Saved in:

Bibliographic Details
Main Authors:	WANG, Tan, HUANG, Jianqiang, ZHANG, Hanwang, SUN, Qianru
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2020
Subjects:	Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/5592 https://ink.library.smu.edu.sg/context/sis_research/article/6595/viewcontent/CVPR2020_VC_R_CNN.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Visual commonsense representation learning via causal inference
by: WANG, Tan, et al.
Published: (2020)

Deconfounded visual grounding
by: HUANG, Jianqiang, et al.
Published: (2022)

Reducing adaptation latency for multi-concept visual perception in outdoor environments
by: WIGNESS, Maggie, et al.
Published: (2016)

Causal attention for unbiased visual recognition
by: WANG, Tan, et al.
Published: (2021)

Feature prediction diffusion model for video anomaly detection
by: YAN, Cheng, et al.
Published: (2023)

Debiasing NLU models via causal intervention and counterfactual reasoning
by: TIAN, Bing, et al.
Published: (2022)

Symmetry robust descriptor for non-rigid surface matching
by: ZHANG, Zhiyuan, et al.
Published: (2013)

Global context aware convolutions for 3D point cloud understanding
by: ZHANG, Zhiyuan, et al.
Published: (2020)

Edgeduet: Tiling small object detection for edge assisted autonomous mobile vision
by: WANG, Xu, et al.
Published: (2021)

Test-time augmentation for 3D point cloud classification and segmentation
by: VU, Tuan-Anh, et al.
Published: (2024)

Zero-shot ingredient recognition by multi-relational graph convolutional network
by: CHEN, Jingjing, et al.
Published: (2020)

How important is the train-validation split in meta-learning?
by: BAI, Yu, et al.
Published: (2021)

Gesture enhanced comprehension of ambiguous human-to-robot instructions
by: WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON,, et al.
Published: (2020)

Self-trained deep ordinal regression for end-to-end video anomaly detection
by: PANG, Guansong, et al.
Published: (2020)

GDFace: Gated deformation for multi-view face image synthesis
by: XU, Xuemiao, et al.
Published: (2020)

Towards improving system performance in large scale multi-agent systems with selfish agents
by: KUMAR, Rajiv Ranjan
Published: (2022)

Knowledge-aware multimodal fashion chatbot
by: LIAO, Lizi, et al.
Published: (2018)

MLP-3D: A MLP-like 3D architecture with grouped time mixing
by: QIU, Zhaofan, et al.
Published: (2022)

Self-supervised multi-class pre-training for unsupervised anomaly detection and segmentation in medical images
by: TIAN, Yu, et al.
Published: (2021)

Dynamic temporal filtering in video models
by: LONG, Fuchen, et al.
Published: (2022)

Pixel-wise energy-biased abstention learning for anomaly segmentation on complex urban driving scenes
by: TIAN, Yu, et al.
Published: (2022)

Adversarial meta sampling for multilingual low-resource speech recognition
by: XIAO, Yubei, et al.
Published: (2021)

Outlier-robust tensor PCA
by: ZHOU, Pan, et al.
Published: (2016)

Learning to hallucinate face images via component generation and enhancement
by: SONG, Yibing, et al.
Published: (2017)

Self-supervised learning disentangled group representation as feature
by: WANG, Tan, et al.
Published: (2021)

VENUS: A geometrical representation for quantum state visualization
by: RUAN, Shaolun, et al.
Published: (2023)

Transporting causal mechanisms for unsupervised domain adaptation
by: YUE, Zhongqi, et al.
Published: (2021)

Few-shot learner parameterization by diffusion time-steps
by: YUE, Zhongqi, et al.
Published: (2024)

Self-regulation for semantic segmentation
by: ZHANG, Dong, et al.
Published: (2021)

Engaging drivers via competition: A case study with arena
by: CHENG, Hao, et al.
Published: (2021)

On the use of commonsense ontology for multimedia event recounting
by: TAN, Chun-Chet, et al.
Published: (2016)

Human-centered interaction in virtual worlds: A new era of generative artificial intelligence and metaverse
by: WANG, Yuying, et al.
Published: (2024)

Prompting for multimodal hateful meme classification
by: CAO, Rui, et al.
Published: (2022)

HCI in business and organizations: Digital transformation with HCI, metaverse, and AI technologies
by: HUO, Xuenan, et al.
Published: (2024)

Exploring diffusion time-steps for unsupervised representation learning
by: YUE, Zhongqi, et al.
Published: (2024)

Video event detection using motion relativity and visual relatedness
by: WANG, Feng, et al.
Published: (2008)

Open-set domain adaptation by deconfounding domain gaps
by: ZHAO, Xin, et al.
Published: (2023)

VIOLET: Visual Analytics for Explainable Quantum Neural Networks
by: RUAN, Shaolun, et al.
Published: (2024)

Agent-augmented Co-Space: Toward merging of real world and cyberspace
by: TAN, Ah-hwee, et al.
Published: (2010)

Unifying global-local representations in salient object detection with transformers
by: REN, Sucheng, et al.
Published: (2024)