Combining query reduction and expansion for text-retrieval-based bug localization

Automated text-retrieval-based bug localization (TRBL) techniques normally use the full text of a bug report to formulate a query and retrieve parts of the code that are buggy. Previous research has shown that reducing the size of the query increases the effectiveness of TRBL. On the other hand, res...

Full description

Saved in:
Bibliographic Details
Main Authors: FLOREZ, Juan Manuel, CHAPARRO, Oscar, TREUDE, Christoph, MARCUS, Andrian
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8944
https://ink.library.smu.edu.sg/context/sis_research/article/9947/viewcontent/saner21.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9947
record_format dspace
spelling sg-smu-ink.sis_research-99472024-07-04T08:43:42Z Combining query reduction and expansion for text-retrieval-based bug localization FLOREZ, Juan Manuel CHAPARRO, Oscar TREUDE, Christoph MARCUS, Andrian Automated text-retrieval-based bug localization (TRBL) techniques normally use the full text of a bug report to formulate a query and retrieve parts of the code that are buggy. Previous research has shown that reducing the size of the query increases the effectiveness of TRBL. On the other hand, researchers also found improvements when expanding the query (i.e., adding more terms). In this paper, we bring these two views together to reformulate queries for TRBL. Specifically, we improve discourse-based query reduction strategies, by adopting a combinatorial approach and using task phrases from bug reports, and combine them with a state-of-the-art query expansion technique, resulting in 970 query reformulation strategies. We investigate the benefits of these strategies for localizing buggy code elements and define a new approach, called Qrex, based on the most effective strategy. We evaluated the reformulation strategies, including Qrex, on 1,217 queries from different software systems to retrieve buggy code artifacts at three code granularities, using five state-of-the-art automated TRBL approaches. The results indicate that Qrex increases TRBL effectiveness by 4% - 12.6%, compared to applying query reduction and expansion in isolation, and by 32.1%, compared to the no-reformulation baseline. 2021-03-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8944 info:doi/10.1109/SANER50967.2021.00024 https://ink.library.smu.edu.sg/context/sis_research/article/9947/viewcontent/saner21.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University bug localization query expansion query reduction query reformulation software engineering Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic bug localization
query expansion
query reduction
query reformulation
software engineering
Software Engineering
spellingShingle bug localization
query expansion
query reduction
query reformulation
software engineering
Software Engineering
FLOREZ, Juan Manuel
CHAPARRO, Oscar
TREUDE, Christoph
MARCUS, Andrian
Combining query reduction and expansion for text-retrieval-based bug localization
description Automated text-retrieval-based bug localization (TRBL) techniques normally use the full text of a bug report to formulate a query and retrieve parts of the code that are buggy. Previous research has shown that reducing the size of the query increases the effectiveness of TRBL. On the other hand, researchers also found improvements when expanding the query (i.e., adding more terms). In this paper, we bring these two views together to reformulate queries for TRBL. Specifically, we improve discourse-based query reduction strategies, by adopting a combinatorial approach and using task phrases from bug reports, and combine them with a state-of-the-art query expansion technique, resulting in 970 query reformulation strategies. We investigate the benefits of these strategies for localizing buggy code elements and define a new approach, called Qrex, based on the most effective strategy. We evaluated the reformulation strategies, including Qrex, on 1,217 queries from different software systems to retrieve buggy code artifacts at three code granularities, using five state-of-the-art automated TRBL approaches. The results indicate that Qrex increases TRBL effectiveness by 4% - 12.6%, compared to applying query reduction and expansion in isolation, and by 32.1%, compared to the no-reformulation baseline.
format text
author FLOREZ, Juan Manuel
CHAPARRO, Oscar
TREUDE, Christoph
MARCUS, Andrian
author_facet FLOREZ, Juan Manuel
CHAPARRO, Oscar
TREUDE, Christoph
MARCUS, Andrian
author_sort FLOREZ, Juan Manuel
title Combining query reduction and expansion for text-retrieval-based bug localization
title_short Combining query reduction and expansion for text-retrieval-based bug localization
title_full Combining query reduction and expansion for text-retrieval-based bug localization
title_fullStr Combining query reduction and expansion for text-retrieval-based bug localization
title_full_unstemmed Combining query reduction and expansion for text-retrieval-based bug localization
title_sort combining query reduction and expansion for text-retrieval-based bug localization
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/8944
https://ink.library.smu.edu.sg/context/sis_research/article/9947/viewcontent/saner21.pdf
_version_ 1814047654185795584