Dynamic fusion with intra-and inter-modality attention flow for visual question answering

Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information ﬂow, which alternatively pass dynamic information between and across the visual and langu...

Full description

Saved in:

Bibliographic Details
Main Authors:	GAO, Peng, JIANG, Zhengkai, YOU, Haoxuan, LU, Pan, HOI, Steven C. H., WANG, Xiaogang, LI, Hongsheng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2019
Subjects:	Vision + Language Vision Applications and Systems Visual Reasoning Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/5260 https://ink.library.smu.edu.sg/context/sis_research/article/6263/viewcontent/Gao_Dynamic_Fusion_With_Intra__and_Inter_Modality_Attention_Flow_for_Visual_CVPR_2019_paper.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Description
Summary:	Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information ﬂow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus signiﬁcantly improves the performance of visual question answering. We also show that the proposed dynamic intra-modality attention ﬂow conditioned on the other modality can dynamically modulate the intramodality attention of the target modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.

Dynamic fusion with intra-and inter-modality attention flow for visual question answering

Similar Items