Exploring duality in visual question-driven top-down saliency

Top-down, goal-driven visual saliency exerts a huge influence on the human visual system for performing visual tasks. Text generations, like visual question answering (VQA) and visual question generation (VQG), have intrinsic connections with top-down saliency, which is usually involved in both VQA...

Full description

Saved in:

Bibliographic Details
Main Authors:	HE, Shengfeng, HAN, Chu, HAN, Guoqiang, QIN, Jing
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2020
Subjects:	Task analysis Visualization Feature extraction Training Pipelines Learning systems Knowledge discovery Dual learning saliency visual question answering (VQA) visual question generation (VQG) Information Security
Online Access:	https://ink.library.smu.edu.sg/sis_research/7857
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8860
record_format	dspace
spelling	sg-smu-ink.sis_research-88602023-06-15T09:00:05Z Exploring duality in visual question-driven top-down saliency HE, Shengfeng HAN, Chu HAN, Guoqiang QIN, Jing Top-down, goal-driven visual saliency exerts a huge influence on the human visual system for performing visual tasks. Text generations, like visual question answering (VQA) and visual question generation (VQG), have intrinsic connections with top-down saliency, which is usually involved in both VQA and VQG processes in an unsupervised manner. However, it is shown that the regions that humans choose to look at to answer questions are very different from the unsupervised attention models. In this brief, we aim to explore the intrinsic relationship between top-down saliency and text generations, and to figure out whether an accurate saliency response benefits text generation. To this end, we propose a dual supervised network with dynamic parameter prediction. Dual-supervision explicitly exploits the probabilistic correlation between the primal task top-down saliency detection and the dual task text generation, while dynamic parameter prediction encodes the given text (i.e., question or answer) into the fully convolutional network. Extensive experiments show the proposed top-down saliency method achieves the best correlation with human attention among various baselines. In addition, the proposed model can be guided by either questions or answers, and output the counterpart. Furthermore, we show that combining human-like visual question-saliency improves the performance of both answer and question generations. 2020-07-01T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/7857 info:doi/10.1109/TNNLS.2019.2933439 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Task analysis Visualization Feature extraction Training Pipelines Learning systems Knowledge discovery Dual learning saliency visual question answering (VQA) visual question generation (VQG) Information Security
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Task analysis Visualization Feature extraction Training Pipelines Learning systems Knowledge discovery Dual learning saliency visual question answering (VQA) visual question generation (VQG) Information Security
spellingShingle	Task analysis Visualization Feature extraction Training Pipelines Learning systems Knowledge discovery Dual learning saliency visual question answering (VQA) visual question generation (VQG) Information Security HE, Shengfeng HAN, Chu HAN, Guoqiang QIN, Jing Exploring duality in visual question-driven top-down saliency
description	Top-down, goal-driven visual saliency exerts a huge influence on the human visual system for performing visual tasks. Text generations, like visual question answering (VQA) and visual question generation (VQG), have intrinsic connections with top-down saliency, which is usually involved in both VQA and VQG processes in an unsupervised manner. However, it is shown that the regions that humans choose to look at to answer questions are very different from the unsupervised attention models. In this brief, we aim to explore the intrinsic relationship between top-down saliency and text generations, and to figure out whether an accurate saliency response benefits text generation. To this end, we propose a dual supervised network with dynamic parameter prediction. Dual-supervision explicitly exploits the probabilistic correlation between the primal task top-down saliency detection and the dual task text generation, while dynamic parameter prediction encodes the given text (i.e., question or answer) into the fully convolutional network. Extensive experiments show the proposed top-down saliency method achieves the best correlation with human attention among various baselines. In addition, the proposed model can be guided by either questions or answers, and output the counterpart. Furthermore, we show that combining human-like visual question-saliency improves the performance of both answer and question generations.
format	text
author	HE, Shengfeng HAN, Chu HAN, Guoqiang QIN, Jing
author_facet	HE, Shengfeng HAN, Chu HAN, Guoqiang QIN, Jing
author_sort	HE, Shengfeng
title	Exploring duality in visual question-driven top-down saliency
title_short	Exploring duality in visual question-driven top-down saliency
title_full	Exploring duality in visual question-driven top-down saliency
title_fullStr	Exploring duality in visual question-driven top-down saliency
title_full_unstemmed	Exploring duality in visual question-driven top-down saliency
title_sort	exploring duality in visual question-driven top-down saliency
publisher	Institutional Knowledge at Singapore Management University
publishDate	2020
url	https://ink.library.smu.edu.sg/sis_research/7857
_version_	1770576557469859840

Exploring duality in visual question-driven top-down saliency

Similar Items