Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics

End-to-end dialog systems are gaining interest due to the recent advances of deep neural networks and the availability of large human–human dialog corpora. However, in spite of being of fundamental importance to systematically improve the performance of this kind of systems, automatic evaluation of...

全面介紹

Saved in:
書目詳細資料
Main Authors: D'Haro, Luis Fernando, Banchs, Rafael E., Hori, Chiori, Li, Haizhou
其他作者: School of Computer Science and Engineering
格式: Article
語言:English
出版: 2021
主題:
在線閱讀:https://hdl.handle.net/10356/151218
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
id sg-ntu-dr.10356-151218
record_format dspace
spelling sg-ntu-dr.10356-1512182021-07-02T03:31:40Z Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics D'Haro, Luis Fernando Banchs, Rafael E. Hori, Chiori Li, Haizhou School of Computer Science and Engineering Engineering::Computer science and engineering Automatic Evaluation Metrics Dialog Systems End-to-end dialog systems are gaining interest due to the recent advances of deep neural networks and the availability of large human–human dialog corpora. However, in spite of being of fundamental importance to systematically improve the performance of this kind of systems, automatic evaluation of the generated dialog utterances is still an unsolved problem. Indeed, most of the proposed objective metrics shown low correlation with human evaluations. In this paper, we evaluate a two-dimensional evaluation metric that is designed to operate at sentence level, which considers the syntactic and semantic information carried along the answers generated by an end-to-end dialog system with respect to a set of references. The proposed metric, when applied to outputs generated by the systems participating in track 2 of the DSTC-6 challenge, shows a higher correlation with human evaluations (up to 12.8% relative improvement at the system level) than the best of the alternative state-of-the-art automatic metrics currently available. 2021-07-02T03:31:40Z 2021-07-02T03:31:40Z 2018 Journal Article D'Haro, L. F., Banchs, R. E., Hori, C. & Li, H. (2018). Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics. Computer Speech and Language, 55, 200-215. https://dx.doi.org/10.1016/j.csl.2018.12.004 0885-2308 0000-0002-4201-7578 https://hdl.handle.net/10356/151218 10.1016/j.csl.2018.12.004 2-s2.0-85059347815 55 200 215 en Computer Speech and Language © 2018 Elsevier Ltd. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Automatic Evaluation Metrics
Dialog Systems
spellingShingle Engineering::Computer science and engineering
Automatic Evaluation Metrics
Dialog Systems
D'Haro, Luis Fernando
Banchs, Rafael E.
Hori, Chiori
Li, Haizhou
Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
description End-to-end dialog systems are gaining interest due to the recent advances of deep neural networks and the availability of large human–human dialog corpora. However, in spite of being of fundamental importance to systematically improve the performance of this kind of systems, automatic evaluation of the generated dialog utterances is still an unsolved problem. Indeed, most of the proposed objective metrics shown low correlation with human evaluations. In this paper, we evaluate a two-dimensional evaluation metric that is designed to operate at sentence level, which considers the syntactic and semantic information carried along the answers generated by an end-to-end dialog system with respect to a set of references. The proposed metric, when applied to outputs generated by the systems participating in track 2 of the DSTC-6 challenge, shows a higher correlation with human evaluations (up to 12.8% relative improvement at the system level) than the best of the alternative state-of-the-art automatic metrics currently available.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
D'Haro, Luis Fernando
Banchs, Rafael E.
Hori, Chiori
Li, Haizhou
format Article
author D'Haro, Luis Fernando
Banchs, Rafael E.
Hori, Chiori
Li, Haizhou
author_sort D'Haro, Luis Fernando
title Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
title_short Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
title_full Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
title_fullStr Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
title_full_unstemmed Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
title_sort automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
publishDate 2021
url https://hdl.handle.net/10356/151218
_version_ 1705151320148672512