Dynamic fusion with intra-and inter-modality attention flow for visual question answering
Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and langu...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2019
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/5260 https://ink.library.smu.edu.sg/context/sis_research/article/6263/viewcontent/Gao_Dynamic_Fusion_With_Intra__and_Inter_Modality_Attention_Flow_for_Visual_CVPR_2019_paper.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-6263 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-62632020-07-30T06:58:51Z Dynamic fusion with intra-and inter-modality attention flow for visual question answering GAO, Peng JIANG, Zhengkai YOU, Haoxuan LU, Pan HOI, Steven C. H. WANG, Xiaogang LI, Hongsheng Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that the proposed dynamic intra-modality attention flow conditioned on the other modality can dynamically modulate the intramodality attention of the target modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method. 2019-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5260 info:doi/10.1109/CVPR.2019.00680 https://ink.library.smu.edu.sg/context/sis_research/article/6263/viewcontent/Gao_Dynamic_Fusion_With_Intra__and_Inter_Modality_Attention_Flow_for_Visual_CVPR_2019_paper.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Vision + Language Vision Applications and Systems Visual Reasoning Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Vision + Language Vision Applications and Systems Visual Reasoning Databases and Information Systems |
spellingShingle |
Vision + Language Vision Applications and Systems Visual Reasoning Databases and Information Systems GAO, Peng JIANG, Zhengkai YOU, Haoxuan LU, Pan HOI, Steven C. H. WANG, Xiaogang LI, Hongsheng Dynamic fusion with intra-and inter-modality attention flow for visual question answering |
description |
Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that the proposed dynamic intra-modality attention flow conditioned on the other modality can dynamically modulate the intramodality attention of the target modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method. |
format |
text |
author |
GAO, Peng JIANG, Zhengkai YOU, Haoxuan LU, Pan HOI, Steven C. H. WANG, Xiaogang LI, Hongsheng |
author_facet |
GAO, Peng JIANG, Zhengkai YOU, Haoxuan LU, Pan HOI, Steven C. H. WANG, Xiaogang LI, Hongsheng |
author_sort |
GAO, Peng |
title |
Dynamic fusion with intra-and inter-modality attention flow for visual question answering |
title_short |
Dynamic fusion with intra-and inter-modality attention flow for visual question answering |
title_full |
Dynamic fusion with intra-and inter-modality attention flow for visual question answering |
title_fullStr |
Dynamic fusion with intra-and inter-modality attention flow for visual question answering |
title_full_unstemmed |
Dynamic fusion with intra-and inter-modality attention flow for visual question answering |
title_sort |
dynamic fusion with intra-and inter-modality attention flow for visual question answering |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2019 |
url |
https://ink.library.smu.edu.sg/sis_research/5260 https://ink.library.smu.edu.sg/context/sis_research/article/6263/viewcontent/Gao_Dynamic_Fusion_With_Intra__and_Inter_Modality_Attention_Flow_for_Visual_CVPR_2019_paper.pdf |
_version_ |
1770575364029939712 |