Answer summarization for technical queries: Benchmark and new approach

Prior studies have demonstrated that approaches to generate an answer summary for a given technical query in Software Question and Answer (SQA) sites are desired. We find that existing approaches are assessed solely through user studies. Hence, a new user study needs to be performed every time a new...

Full description

Saved in:

Bibliographic Details
Main Authors:	YANG, Chengran, XU, Bowen, THUNG, Ferdian, SHI, Yucen, ZHANG, Ting, YANG, Zhou, ZHOU, Xin, SHI, Jieke, HE, Junda, HAN, DongGyun, David LO
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Summarization Question retrieval Pre-trained models Artificial Intelligence and Robotics Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/7714 https://ink.library.smu.edu.sg/context/sis_research/article/8717/viewcontent/2209.10868.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8717
record_format	dspace
spelling	sg-smu-ink.sis_research-87172023-09-12T07:38:19Z Answer summarization for technical queries: Benchmark and new approach YANG, Chengran XU, Bowen THUNG, Ferdian SHI, Yucen ZHANG, Ting YANG, Zhou ZHOU, Xin SHI, Jieke HE, Junda HAN, DongGyun David LO, Prior studies have demonstrated that approaches to generate an answer summary for a given technical query in Software Question and Answer (SQA) sites are desired. We find that existing approaches are assessed solely through user studies. Hence, a new user study needs to be performed every time a new approach is introduced; this is time-consuming, slows down the development of the new approach, and results from different user studies may not be comparable to each other. There is a need for a benchmark with ground truth summaries as a complement assessment through user studies. Unfortunately, such a benchmark is non-existent for answer summarization for technical queries from SQA sites. To fill the gap, we manually construct a high-quality benchmark to enable automatic evaluation of answer summarization for the technical queries for SQA sites. It contains 111 query-summary pairs extracted from 382 Stack Overflow answers with 2,014 sentence candidates. Using the benchmark, we comprehensively evaluate the performance of existing approaches and find that there is still a big room for improvements. Motivated by the results, we propose a new approach TechSumBot with three key modules:1) Usefulness Ranking module; 2) Centrality Estimation module; and 3) Redundancy Removal module. We evaluate TechSumBot in both automatic (i.e., using our benchmark) and manual (i.e., via a user study) manners. The results from both evaluations consistently demonstrate that TechSumBot outperforms the best performing baseline approaches from both SE and NLP domains by a large margin, i.e., 10.83%–14.90%, 32.75%–36.59%, and 12.61%–17.54%, in terms of ROUGE-1, ROUGE2, and ROUGE-L on automatic evaluation, and 5.79%–9.23% and 17.03%–17.68%, in terms of average usefulness and diversity score on human evaluation. This highlights that automatic evaluation on our benchmark can uncover findings similar to the ones found through user studies. More importantly, the automatic evaluation has a much lower cost, especially when it is used to assess a new approach. Additionally, we also conducted an ablation study, which demonstrates that each module in TechSumBot contributes to boosting the overall performance of TechSumBot. We release the benchmark as well as the replication package of our experiment at https://github.com/TechSumBot/TechSumBot. 2022-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7714 info:doi/10.1145/3551349.3560421 https://ink.library.smu.edu.sg/context/sis_research/article/8717/viewcontent/2209.10868.pdf http://creativecommons.org/licenses/by/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Summarization Question retrieval Pre-trained models Artificial Intelligence and Robotics Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Summarization Question retrieval Pre-trained models Artificial Intelligence and Robotics Software Engineering
spellingShingle	Summarization Question retrieval Pre-trained models Artificial Intelligence and Robotics Software Engineering YANG, Chengran XU, Bowen THUNG, Ferdian SHI, Yucen ZHANG, Ting YANG, Zhou ZHOU, Xin SHI, Jieke HE, Junda HAN, DongGyun David LO, Answer summarization for technical queries: Benchmark and new approach
description	Prior studies have demonstrated that approaches to generate an answer summary for a given technical query in Software Question and Answer (SQA) sites are desired. We find that existing approaches are assessed solely through user studies. Hence, a new user study needs to be performed every time a new approach is introduced; this is time-consuming, slows down the development of the new approach, and results from different user studies may not be comparable to each other. There is a need for a benchmark with ground truth summaries as a complement assessment through user studies. Unfortunately, such a benchmark is non-existent for answer summarization for technical queries from SQA sites. To fill the gap, we manually construct a high-quality benchmark to enable automatic evaluation of answer summarization for the technical queries for SQA sites. It contains 111 query-summary pairs extracted from 382 Stack Overflow answers with 2,014 sentence candidates. Using the benchmark, we comprehensively evaluate the performance of existing approaches and find that there is still a big room for improvements. Motivated by the results, we propose a new approach TechSumBot with three key modules:1) Usefulness Ranking module; 2) Centrality Estimation module; and 3) Redundancy Removal module. We evaluate TechSumBot in both automatic (i.e., using our benchmark) and manual (i.e., via a user study) manners. The results from both evaluations consistently demonstrate that TechSumBot outperforms the best performing baseline approaches from both SE and NLP domains by a large margin, i.e., 10.83%–14.90%, 32.75%–36.59%, and 12.61%–17.54%, in terms of ROUGE-1, ROUGE2, and ROUGE-L on automatic evaluation, and 5.79%–9.23% and 17.03%–17.68%, in terms of average usefulness and diversity score on human evaluation. This highlights that automatic evaluation on our benchmark can uncover findings similar to the ones found through user studies. More importantly, the automatic evaluation has a much lower cost, especially when it is used to assess a new approach. Additionally, we also conducted an ablation study, which demonstrates that each module in TechSumBot contributes to boosting the overall performance of TechSumBot. We release the benchmark as well as the replication package of our experiment at https://github.com/TechSumBot/TechSumBot.
format	text
author	YANG, Chengran XU, Bowen THUNG, Ferdian SHI, Yucen ZHANG, Ting YANG, Zhou ZHOU, Xin SHI, Jieke HE, Junda HAN, DongGyun David LO,
author_facet	YANG, Chengran XU, Bowen THUNG, Ferdian SHI, Yucen ZHANG, Ting YANG, Zhou ZHOU, Xin SHI, Jieke HE, Junda HAN, DongGyun David LO,
author_sort	YANG, Chengran
title	Answer summarization for technical queries: Benchmark and new approach
title_short	Answer summarization for technical queries: Benchmark and new approach
title_full	Answer summarization for technical queries: Benchmark and new approach
title_fullStr	Answer summarization for technical queries: Benchmark and new approach
title_full_unstemmed	Answer summarization for technical queries: Benchmark and new approach
title_sort	answer summarization for technical queries: benchmark and new approach
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/7714 https://ink.library.smu.edu.sg/context/sis_research/article/8717/viewcontent/2209.10868.pdf
_version_	1779157129642377216

Answer summarization for technical queries: Benchmark and new approach

Similar Items