Predicting Effectiveness of IR-Based Bug Localization Techniques

Recently, many information retrieval (IR) based bug localization approaches have been proposed in the literature. These approaches use information retrieval techniques to process a textual bug report and a collection of source code files to find buggy files. They output a ranked list of files sorted...

Full description

Saved in:

Bibliographic Details
Main Authors:	LE, Tien-Duy B., THUNG, Ferdian, LO, David
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2014
Subjects:	Bug Localization Bug Reports Effectiveness Prediction Information Retrieval Text Classification Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/2431 https://ink.library.smu.edu.sg/context/sis_research/article/3431/viewcontent/issre14.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-3431
record_format	dspace
spelling	sg-smu-ink.sis_research-34312020-12-07T08:59:03Z Predicting Effectiveness of IR-Based Bug Localization Techniques LE, Tien-Duy B. THUNG, Ferdian LO, David Recently, many information retrieval (IR) based bug localization approaches have been proposed in the literature. These approaches use information retrieval techniques to process a textual bug report and a collection of source code files to find buggy files. They output a ranked list of files sorted by their likelihood to contain the bug. Recent approaches can achieve reasonable accuracy, however, even a state-of-the-art bug localization tool outputs many ranked lists where buggy files appear very low in the lists. This potentially causes developers to distrust bug localization tools. Parnin and Orso recently conduct a user study and highlight that developers do not find an automated debugging tool useful if they do not find the root cause of a bug early in a ranked list. To address this problem, we build an oracle that can automatically predict whether a ranked list produced by an IR-based bug localization tool is likely to be effective or not. We consider a ranked list to be effective if a buggy file appears in the top-N position of the list. If a ranked list is unlikely to be effective, developers do not need to waste time in checking the recommended files one by one. In such cases, it is better for developers to use traditional debugging methods or request for further information to localize bugs. To build this oracle, our approach extracts features that can be divided into four categories: score features, textual features, topic model features, and metadata features. We build a separate prediction model for each category, and combine them to create a composite prediction model which is used as the oracle. We name our proposed approach APRILE, which stands for Automated Prediction of IR-based Bug Localization's Effectiveness. We have evaluated APRILE to predict the effectiveness of three state-of-the-art IR based bug localization tools on more than three thousands bug reports from AspectJ, Eclipse, and SWT. APRILE can achieve an average precision, recall, and - -measure of at least 70.36%, 66.94%, and 68.03%, respectively. Furthermore, APRILE outperforms a baseline approach by 84.48%, 17.74%, and 31.56% for the AspectJ, Eclipse, and SWT bug reports, respectively. 2014-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2431 info:doi/10.1109/ISSRE.2014.39 https://ink.library.smu.edu.sg/context/sis_research/article/3431/viewcontent/issre14.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Bug Localization Bug Reports Effectiveness Prediction Information Retrieval Text Classification Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Bug Localization Bug Reports Effectiveness Prediction Information Retrieval Text Classification Software Engineering
spellingShingle	Bug Localization Bug Reports Effectiveness Prediction Information Retrieval Text Classification Software Engineering LE, Tien-Duy B. THUNG, Ferdian LO, David Predicting Effectiveness of IR-Based Bug Localization Techniques
description	Recently, many information retrieval (IR) based bug localization approaches have been proposed in the literature. These approaches use information retrieval techniques to process a textual bug report and a collection of source code files to find buggy files. They output a ranked list of files sorted by their likelihood to contain the bug. Recent approaches can achieve reasonable accuracy, however, even a state-of-the-art bug localization tool outputs many ranked lists where buggy files appear very low in the lists. This potentially causes developers to distrust bug localization tools. Parnin and Orso recently conduct a user study and highlight that developers do not find an automated debugging tool useful if they do not find the root cause of a bug early in a ranked list. To address this problem, we build an oracle that can automatically predict whether a ranked list produced by an IR-based bug localization tool is likely to be effective or not. We consider a ranked list to be effective if a buggy file appears in the top-N position of the list. If a ranked list is unlikely to be effective, developers do not need to waste time in checking the recommended files one by one. In such cases, it is better for developers to use traditional debugging methods or request for further information to localize bugs. To build this oracle, our approach extracts features that can be divided into four categories: score features, textual features, topic model features, and metadata features. We build a separate prediction model for each category, and combine them to create a composite prediction model which is used as the oracle. We name our proposed approach APRILE, which stands for Automated Prediction of IR-based Bug Localization's Effectiveness. We have evaluated APRILE to predict the effectiveness of three state-of-the-art IR based bug localization tools on more than three thousands bug reports from AspectJ, Eclipse, and SWT. APRILE can achieve an average precision, recall, and - -measure of at least 70.36%, 66.94%, and 68.03%, respectively. Furthermore, APRILE outperforms a baseline approach by 84.48%, 17.74%, and 31.56% for the AspectJ, Eclipse, and SWT bug reports, respectively.
format	text
author	LE, Tien-Duy B. THUNG, Ferdian LO, David
author_facet	LE, Tien-Duy B. THUNG, Ferdian LO, David
author_sort	LE, Tien-Duy B.
title	Predicting Effectiveness of IR-Based Bug Localization Techniques
title_short	Predicting Effectiveness of IR-Based Bug Localization Techniques
title_full	Predicting Effectiveness of IR-Based Bug Localization Techniques
title_fullStr	Predicting Effectiveness of IR-Based Bug Localization Techniques
title_full_unstemmed	Predicting Effectiveness of IR-Based Bug Localization Techniques
title_sort	predicting effectiveness of ir-based bug localization techniques
publisher	Institutional Knowledge at Singapore Management University
publishDate	2014
url	https://ink.library.smu.edu.sg/sis_research/2431 https://ink.library.smu.edu.sg/context/sis_research/article/3431/viewcontent/issre14.pdf
_version_	1770572144317562880

Predicting Effectiveness of IR-Based Bug Localization Techniques

Similar Items