Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools

Information retrieval (IR) based bug localization approaches process a textual bug report and a collection of source code files to find buggy files. They output a ranked list of files sorted by their likelihood to contain the bug. Recently, several IR-based bug localization tools have been proposed....

Full description

Saved in:
Bibliographic Details
Main Authors: LE, Tien-Duy B., THUNG, Ferdian, LO, David
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3704
https://ink.library.smu.edu.sg/context/sis_research/article/4706/viewcontent/LocalizationToolBug_2017_afv.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4706
record_format dspace
spelling sg-smu-ink.sis_research-47062020-01-23T08:15:40Z Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools LE, Tien-Duy B. THUNG, Ferdian LO, David Information retrieval (IR) based bug localization approaches process a textual bug report and a collection of source code files to find buggy files. They output a ranked list of files sorted by their likelihood to contain the bug. Recently, several IR-based bug localization tools have been proposed. However, there are no perfect tools that can successfully localize faults within a few number of most suspicious program elements for every single input bug report. Therefore, it is difficult for developers to decide which tool would be effective for a given bug report. Furthermore, for some bug reports, no bug localization tools would be useful. Even a state-of-the-art bug localization tool outputs many ranked lists where buggy files appear very low in the lists. This potentially causes developers to distrust bug localization tools. In this work, we build an oracle that can automatically predict whether a ranked list produced by an IR-based bug localization tool is likely to be effective or not. We consider a ranked list to be effective if a buggy file appears in the top-N position of the list. If a ranked list is unlikely to be effective, developers do not need to waste time in checking the recommended files one by one. In such cases, it is better for developers to use traditional debugging methods or request for further information to localize bugs. To build this oracle, our approach extracts features that can be divided into four categories: score features, textual features, topic model features, and metadata features. We build a separate prediction model for each category, and combine them to create a composite prediction model which is used as the oracle. We name this solution APRILE, which stands for Automated PRediction of IR-based Bug Localization’s Effectiveness. We further integrate APRILE with two other components that are learned using our bagging-based ensemble classification (BEC) method. We refer to the extension of APRILE as APRILE +. We have evaluated APRILE + to predict the effectiveness of three state-of-the-art IR-based bug localization tools on more than three thousands bug reports from AspectJ, Eclipse, SWT, and Tomcat. APRILE + can achieve an average precision, recall, and F-measure of 77.61 %, 88.94 %, and 82.09 %, respectively. Furthermore, APRILE + outperforms a baseline approach by Le and Lo and APRILE by up to a 17.43 % and 10.51 % increase in F-measure respectively. 2017-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3704 info:doi/10.1007/s10664-016-9484-y https://ink.library.smu.edu.sg/context/sis_research/article/4706/viewcontent/LocalizationToolBug_2017_afv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Bug localization Bug reports Effectiveness prediction Information retrieval Text classification Computer Sciences Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Bug localization
Bug reports
Effectiveness prediction
Information retrieval
Text classification
Computer Sciences
Software Engineering
spellingShingle Bug localization
Bug reports
Effectiveness prediction
Information retrieval
Text classification
Computer Sciences
Software Engineering
LE, Tien-Duy B.
THUNG, Ferdian
LO, David
Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools
description Information retrieval (IR) based bug localization approaches process a textual bug report and a collection of source code files to find buggy files. They output a ranked list of files sorted by their likelihood to contain the bug. Recently, several IR-based bug localization tools have been proposed. However, there are no perfect tools that can successfully localize faults within a few number of most suspicious program elements for every single input bug report. Therefore, it is difficult for developers to decide which tool would be effective for a given bug report. Furthermore, for some bug reports, no bug localization tools would be useful. Even a state-of-the-art bug localization tool outputs many ranked lists where buggy files appear very low in the lists. This potentially causes developers to distrust bug localization tools. In this work, we build an oracle that can automatically predict whether a ranked list produced by an IR-based bug localization tool is likely to be effective or not. We consider a ranked list to be effective if a buggy file appears in the top-N position of the list. If a ranked list is unlikely to be effective, developers do not need to waste time in checking the recommended files one by one. In such cases, it is better for developers to use traditional debugging methods or request for further information to localize bugs. To build this oracle, our approach extracts features that can be divided into four categories: score features, textual features, topic model features, and metadata features. We build a separate prediction model for each category, and combine them to create a composite prediction model which is used as the oracle. We name this solution APRILE, which stands for Automated PRediction of IR-based Bug Localization’s Effectiveness. We further integrate APRILE with two other components that are learned using our bagging-based ensemble classification (BEC) method. We refer to the extension of APRILE as APRILE +. We have evaluated APRILE + to predict the effectiveness of three state-of-the-art IR-based bug localization tools on more than three thousands bug reports from AspectJ, Eclipse, SWT, and Tomcat. APRILE + can achieve an average precision, recall, and F-measure of 77.61 %, 88.94 %, and 82.09 %, respectively. Furthermore, APRILE + outperforms a baseline approach by Le and Lo and APRILE by up to a 17.43 % and 10.51 % increase in F-measure respectively.
format text
author LE, Tien-Duy B.
THUNG, Ferdian
LO, David
author_facet LE, Tien-Duy B.
THUNG, Ferdian
LO, David
author_sort LE, Tien-Duy B.
title Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools
title_short Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools
title_full Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools
title_fullStr Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools
title_full_unstemmed Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools
title_sort will this localization tool be effective for this bug? mitigating the impact of unreliability of information retrieval based bug localization tools
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/3704
https://ink.library.smu.edu.sg/context/sis_research/article/4706/viewcontent/LocalizationToolBug_2017_afv.pdf
_version_ 1770573676553437184