Dynamic fusion with intra-and inter-modality attention flow for visual question answering

Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and langu...

Full description

Saved in:
Bibliographic Details
Main Authors: GAO, Peng, JIANG, Zhengkai, YOU, Haoxuan, LU, Pan, HOI, Steven C. H., WANG, Xiaogang, LI, Hongsheng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/5260
https://ink.library.smu.edu.sg/context/sis_research/article/6263/viewcontent/Gao_Dynamic_Fusion_With_Intra__and_Inter_Modality_Attention_Flow_for_Visual_CVPR_2019_paper.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-6263
record_format dspace
spelling sg-smu-ink.sis_research-62632020-07-30T06:58:51Z Dynamic fusion with intra-and inter-modality attention flow for visual question answering GAO, Peng JIANG, Zhengkai YOU, Haoxuan LU, Pan HOI, Steven C. H. WANG, Xiaogang LI, Hongsheng Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that the proposed dynamic intra-modality attention flow conditioned on the other modality can dynamically modulate the intramodality attention of the target modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method. 2019-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5260 info:doi/10.1109/CVPR.2019.00680 https://ink.library.smu.edu.sg/context/sis_research/article/6263/viewcontent/Gao_Dynamic_Fusion_With_Intra__and_Inter_Modality_Attention_Flow_for_Visual_CVPR_2019_paper.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Vision + Language Vision Applications and Systems Visual Reasoning Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Vision + Language
Vision Applications and Systems
Visual Reasoning
Databases and Information Systems
spellingShingle Vision + Language
Vision Applications and Systems
Visual Reasoning
Databases and Information Systems
GAO, Peng
JIANG, Zhengkai
YOU, Haoxuan
LU, Pan
HOI, Steven C. H.
WANG, Xiaogang
LI, Hongsheng
Dynamic fusion with intra-and inter-modality attention flow for visual question answering
description Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that the proposed dynamic intra-modality attention flow conditioned on the other modality can dynamically modulate the intramodality attention of the target modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.
format text
author GAO, Peng
JIANG, Zhengkai
YOU, Haoxuan
LU, Pan
HOI, Steven C. H.
WANG, Xiaogang
LI, Hongsheng
author_facet GAO, Peng
JIANG, Zhengkai
YOU, Haoxuan
LU, Pan
HOI, Steven C. H.
WANG, Xiaogang
LI, Hongsheng
author_sort GAO, Peng
title Dynamic fusion with intra-and inter-modality attention flow for visual question answering
title_short Dynamic fusion with intra-and inter-modality attention flow for visual question answering
title_full Dynamic fusion with intra-and inter-modality attention flow for visual question answering
title_fullStr Dynamic fusion with intra-and inter-modality attention flow for visual question answering
title_full_unstemmed Dynamic fusion with intra-and inter-modality attention flow for visual question answering
title_sort dynamic fusion with intra-and inter-modality attention flow for visual question answering
publisher Institutional Knowledge at Singapore Management University
publishDate 2019
url https://ink.library.smu.edu.sg/sis_research/5260
https://ink.library.smu.edu.sg/context/sis_research/article/6263/viewcontent/Gao_Dynamic_Fusion_With_Intra__and_Inter_Modality_Attention_Flow_for_Visual_CVPR_2019_paper.pdf
_version_ 1770575364029939712