Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation

This work is extended from our participation in the Dialogue System Technology Challenge (DSTC7), where we participated in the Audio Visual Scene-aware Dialogue System (AVSD) track. The AVSD track evaluates how dialogue systems understand video scenes and responds to users about the video visual and...

Full description

Saved in:

Bibliographic Details
Main Authors:	LE, Hung, SAHOO, Doyen, CHEN, Nancy F., HOI, Steven C. H.
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2020
Subjects:	Audio-visual scene-aware dialogue Dialogue system Multimodal attention Neural network Response generation Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/5259 https://ink.library.smu.edu.sg/context/sis_research/article/6262/viewcontent/Hierarchical_multimodal_attention_av.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-6262
record_format	dspace
spelling	sg-smu-ink.sis_research-62622020-07-30T06:59:26Z Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation LE, Hung SAHOO, Doyen CHEN, Nancy F. HOI, Steven C. H. This work is extended from our participation in the Dialogue System Technology Challenge (DSTC7), where we participated in the Audio Visual Scene-aware Dialogue System (AVSD) track. The AVSD track evaluates how dialogue systems understand video scenes and responds to users about the video visual and audio content. We propose a hierarchical attention approach on user queries, video caption, audio and visual features that contribute to improved evaluation results. We also apply a nonlinear feature fusion approach to combine the visual and audio features for better knowledge representation. Our proposed model shows superior performance in terms of both objective evaluation and human rating as compared to the baselines. In this extended work, we also provide a more extensive review of the related work, conduct additional experiments with word-level and context-level pretrained embeddings, and investigate different qualitative aspects of the generated responses. 2020-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5259 info:doi/10.1016/j.csl.2020.101095 https://ink.library.smu.edu.sg/context/sis_research/article/6262/viewcontent/Hierarchical_multimodal_attention_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Audio-visual scene-aware dialogue Dialogue system Multimodal attention Neural network Response generation Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Audio-visual scene-aware dialogue Dialogue system Multimodal attention Neural network Response generation Databases and Information Systems
spellingShingle	Audio-visual scene-aware dialogue Dialogue system Multimodal attention Neural network Response generation Databases and Information Systems LE, Hung SAHOO, Doyen CHEN, Nancy F. HOI, Steven C. H. Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation
description	This work is extended from our participation in the Dialogue System Technology Challenge (DSTC7), where we participated in the Audio Visual Scene-aware Dialogue System (AVSD) track. The AVSD track evaluates how dialogue systems understand video scenes and responds to users about the video visual and audio content. We propose a hierarchical attention approach on user queries, video caption, audio and visual features that contribute to improved evaluation results. We also apply a nonlinear feature fusion approach to combine the visual and audio features for better knowledge representation. Our proposed model shows superior performance in terms of both objective evaluation and human rating as compared to the baselines. In this extended work, we also provide a more extensive review of the related work, conduct additional experiments with word-level and context-level pretrained embeddings, and investigate different qualitative aspects of the generated responses.
format	text
author	LE, Hung SAHOO, Doyen CHEN, Nancy F. HOI, Steven C. H.
author_facet	LE, Hung SAHOO, Doyen CHEN, Nancy F. HOI, Steven C. H.
author_sort	LE, Hung
title	Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation
title_short	Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation
title_full	Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation
title_fullStr	Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation
title_full_unstemmed	Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation
title_sort	hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2020
url	https://ink.library.smu.edu.sg/sis_research/5259 https://ink.library.smu.edu.sg/context/sis_research/article/6262/viewcontent/Hierarchical_multimodal_attention_av.pdf
_version_	1770575363519283200

Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation

Similar Items