Is multi-hop reasoning really explainable? Towards benchmarking reasoning interpretability

Multi-hop reasoning has been widely studied in recent years to obtain more interpretable link prediction. However, we find in experiments that many paths given by these models are actually unreasonable, while little work has been done on interpretability evaluation for them. In this paper, we propos...

Full description

Saved in:

Bibliographic Details
Main Authors:	LV, Xin, CAO, Yixin, HOU, Lei, LI, Juanzi, LIU, Zhiyuan, ZHANG, Yichi, DAI, Zelin
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/7317 https://ink.library.smu.edu.sg/context/sis_research/article/8320/viewcontent/2021.emnlp_main.700.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	sg-smu-ink.sis_research-8320
record_format	dspace
spelling	sg-smu-ink.sis_research-83202022-09-29T06:00:47Z Is multi-hop reasoning really explainable? Towards benchmarking reasoning interpretability LV, Xin CAO, Yixin HOU, Lei LI, Juanzi LIU, Zhiyuan ZHANG, Yichi DAI, Zelin Multi-hop reasoning has been widely studied in recent years to obtain more interpretable link prediction. However, we find in experiments that many paths given by these models are actually unreasonable, while little work has been done on interpretability evaluation for them. In this paper, we propose a unified framework to quantitatively evaluate the interpretability of multi-hop reasoning models so as to advance their development. In specific, we define three metrics, including path recall, local interpretability, and global interpretability for evaluation, and design an approximate strategy to calculate these metrics using the interpretability scores of rules. We manually annotate all possible rules and establish a benchmark. In experiments, we verify the effectiveness of our benchmark. Besides, we run nine representative baselines on our benchmark, and the experimental results show that the interpretability of current multi-hop reasoning models is less satisfactory and is 51.7% lower than the upper bound given by our benchmark. Moreover, the rule-based models outperform the multi-hop reasoning models in terms of performance and interpretability, which points to a direction for future research, i.e., how to better incorporate rule information into the multi-hop reasoning model. We will publish our codes and datasets upon acceptance. 2021-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7317 info:doi/10.18653/v1/2021.emnlp-main.700 https://ink.library.smu.edu.sg/context/sis_research/article/8320/viewcontent/2021.emnlp_main.700.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Databases and Information Systems
spellingShingle	Databases and Information Systems LV, Xin CAO, Yixin HOU, Lei LI, Juanzi LIU, Zhiyuan ZHANG, Yichi DAI, Zelin Is multi-hop reasoning really explainable? Towards benchmarking reasoning interpretability
description	Multi-hop reasoning has been widely studied in recent years to obtain more interpretable link prediction. However, we find in experiments that many paths given by these models are actually unreasonable, while little work has been done on interpretability evaluation for them. In this paper, we propose a unified framework to quantitatively evaluate the interpretability of multi-hop reasoning models so as to advance their development. In specific, we define three metrics, including path recall, local interpretability, and global interpretability for evaluation, and design an approximate strategy to calculate these metrics using the interpretability scores of rules. We manually annotate all possible rules and establish a benchmark. In experiments, we verify the effectiveness of our benchmark. Besides, we run nine representative baselines on our benchmark, and the experimental results show that the interpretability of current multi-hop reasoning models is less satisfactory and is 51.7% lower than the upper bound given by our benchmark. Moreover, the rule-based models outperform the multi-hop reasoning models in terms of performance and interpretability, which points to a direction for future research, i.e., how to better incorporate rule information into the multi-hop reasoning model. We will publish our codes and datasets upon acceptance.
format	text
author	LV, Xin CAO, Yixin HOU, Lei LI, Juanzi LIU, Zhiyuan ZHANG, Yichi DAI, Zelin
author_facet	LV, Xin CAO, Yixin HOU, Lei LI, Juanzi LIU, Zhiyuan ZHANG, Yichi DAI, Zelin
author_sort	LV, Xin
title	Is multi-hop reasoning really explainable? Towards benchmarking reasoning interpretability
title_short	Is multi-hop reasoning really explainable? Towards benchmarking reasoning interpretability
title_full	Is multi-hop reasoning really explainable? Towards benchmarking reasoning interpretability
title_fullStr	Is multi-hop reasoning really explainable? Towards benchmarking reasoning interpretability
title_full_unstemmed	Is multi-hop reasoning really explainable? Towards benchmarking reasoning interpretability
title_sort	is multi-hop reasoning really explainable? towards benchmarking reasoning interpretability
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/7317 https://ink.library.smu.edu.sg/context/sis_research/article/8320/viewcontent/2021.emnlp_main.700.pdf
_version_	1770576310346711040

Is multi-hop reasoning really explainable? Towards benchmarking reasoning interpretability

Similar Items