Dynamic fusion with intra-and inter-modality attention flow for visual question answering

Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information ﬂow, which alternatively pass dynamic information between and across the visual and langu...

Full description

Saved in:

Bibliographic Details
Main Authors:	GAO, Peng, JIANG, Zhengkai, YOU, Haoxuan, LU, Pan, HOI, Steven C. H., WANG, Xiaogang, LI, Hongsheng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2019
Subjects:	Vision + Language Vision Applications and Systems Visual Reasoning Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/5260 https://ink.library.smu.edu.sg/context/sis_research/article/6263/viewcontent/Gao_Dynamic_Fusion_With_Intra__and_Inter_Modality_Attention_Flow_for_Visual_CVPR_2019_paper.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-6263
record_format	dspace
spelling	sg-smu-ink.sis_research-62632020-07-30T06:58:51Z Dynamic fusion with intra-and inter-modality attention flow for visual question answering GAO, Peng JIANG, Zhengkai YOU, Haoxuan LU, Pan HOI, Steven C. H. WANG, Xiaogang LI, Hongsheng Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information ﬂow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus signiﬁcantly improves the performance of visual question answering. We also show that the proposed dynamic intra-modality attention ﬂow conditioned on the other modality can dynamically modulate the intramodality attention of the target modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method. 2019-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5260 info:doi/10.1109/CVPR.2019.00680 https://ink.library.smu.edu.sg/context/sis_research/article/6263/viewcontent/Gao_Dynamic_Fusion_With_Intra__and_Inter_Modality_Attention_Flow_for_Visual_CVPR_2019_paper.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Vision + Language Vision Applications and Systems Visual Reasoning Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Vision + Language Vision Applications and Systems Visual Reasoning Databases and Information Systems
spellingShingle	Vision + Language Vision Applications and Systems Visual Reasoning Databases and Information Systems GAO, Peng JIANG, Zhengkai YOU, Haoxuan LU, Pan HOI, Steven C. H. WANG, Xiaogang LI, Hongsheng Dynamic fusion with intra-and inter-modality attention flow for visual question answering
description	Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information ﬂow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus signiﬁcantly improves the performance of visual question answering. We also show that the proposed dynamic intra-modality attention ﬂow conditioned on the other modality can dynamically modulate the intramodality attention of the target modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.
format	text
author	GAO, Peng JIANG, Zhengkai YOU, Haoxuan LU, Pan HOI, Steven C. H. WANG, Xiaogang LI, Hongsheng
author_facet	GAO, Peng JIANG, Zhengkai YOU, Haoxuan LU, Pan HOI, Steven C. H. WANG, Xiaogang LI, Hongsheng
author_sort	GAO, Peng
title	Dynamic fusion with intra-and inter-modality attention flow for visual question answering
title_short	Dynamic fusion with intra-and inter-modality attention flow for visual question answering
title_full	Dynamic fusion with intra-and inter-modality attention flow for visual question answering
title_fullStr	Dynamic fusion with intra-and inter-modality attention flow for visual question answering
title_full_unstemmed	Dynamic fusion with intra-and inter-modality attention flow for visual question answering
title_sort	dynamic fusion with intra-and inter-modality attention flow for visual question answering
publisher	Institutional Knowledge at Singapore Management University
publishDate	2019
url	https://ink.library.smu.edu.sg/sis_research/5260 https://ink.library.smu.edu.sg/context/sis_research/article/6263/viewcontent/Gao_Dynamic_Fusion_With_Intra__and_Inter_Modality_Attention_Flow_for_Visual_CVPR_2019_paper.pdf
_version_	1770575364029939712

Dynamic fusion with intra-and inter-modality attention flow for visual question answering

Similar Items