On the influence of biases in bug localization: evaluation and benchmark

Bug localization is the task of identifying parts of thesource code that needs to be changed to resolve a bug report.As this task is difficult, automatic bug localization tools havebeen proposed. The development and evaluation of these toolsrely on the availability of high-quality bug report dataset...

Full description

Saved in:

Bibliographic Details
Main Authors:	WIDYASARI, Ratnadira, HARYONO, Stefanus Agus, THUNG, Ferdian, SHI, Jieke, TAN, Constance, WEE, Fiona, PHAN, Jack, David LO
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Bias Bug localization Bug report Python Artificial Intelligence and Robotics Databases and Information Systems Information Security Programming Languages and Compilers
Online Access:	https://ink.library.smu.edu.sg/sis_research/7655 https://ink.library.smu.edu.sg/context/sis_research/article/8658/viewcontent/On_the_influence_of_biases_in_bug_localization_evaluation_and_benchmark.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8658
record_format	dspace
spelling	sg-smu-ink.sis_research-86582023-01-10T03:47:19Z On the influence of biases in bug localization: evaluation and benchmark WIDYASARI, Ratnadira HARYONO, Stefanus Agus THUNG, Ferdian SHI, Jieke TAN, Constance WEE, Fiona PHAN, Jack David LO, Bug localization is the task of identifying parts of thesource code that needs to be changed to resolve a bug report.As this task is difficult, automatic bug localization tools havebeen proposed. The development and evaluation of these toolsrely on the availability of high-quality bug report datasets. In2014, Kochhar et al. identified three biases in datasets used toevaluate bug localization techniques: (1) misclassified bug report,(2) already localized bug report, and (3) incorrect ground truthfile in a bug report. They reported that already localized bugreports statistically significantly and substantially impact buglocalization results, and thus should be removed. However, theirevaluation is still limited, as they only investigated 3 projectswritten in Java. In this study, we replicate the study of Kochharet al. on the effect of biases in bug report dataset for buglocalization. Further investigation on this topic is necessary asnew and larger bug report datasets have been proposed withoutbeing checked for these biases.We conduct our analysis on a collection of 2,913 bug reportstaken from the recently released Bugzbook dataset that fix Pythonfiles. To investigate the prevalence of the biases, we check thebias distributions. For each bias, we select and label a set of bugreports that may contain the bias and compute the proportionof bug reports in the set that exhibit the bias. We find that5%, 23%, and 30% of the bug reports that we investigated areaffected by biases 1, 2, and 3 respectively. Then, we investigatethe effect of the three biases on bug localization by measuringthe performance of IncBL, a recent bug localization tool, andthe classical Vector Space Model (VSM) based bug localizationtool, which was used in the Kochhar et al. study. Our experiment results highlight that bias 2 significantly impact the buglocalization results, while bias 1 and 3 do not have a significantimpact. We also find that the effect sizes of bias 2 to IncBL andVSM are different, where IncBL has a higher effect size thanVSM. Our findings corroborate the result reported by Kochharet al. and demonstrate that bias 2 not only affects the 3 Javaprojects investigated in their study, but also others in anotherprogramming language (i.e., Python). This highlights the need toeliminate bias 2 from the evaluation of future bug localizationtools. As a by-product of our replication study, we have releaseda benchmark dataset, which we refer to as CAPTURED, that hasbeen cleaned from the three biases. CAPTURED contains Pythonprograms and therefore augments the cleaned dataset releasedby Kochhar et al., which only contains Java programs. 2022-03-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7655 info:doi/10.1109/SANER53432.2022.00027 https://ink.library.smu.edu.sg/context/sis_research/article/8658/viewcontent/On_the_influence_of_biases_in_bug_localization_evaluation_and_benchmark.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Bias Bug localization Bug report Python Artificial Intelligence and Robotics Databases and Information Systems Information Security Programming Languages and Compilers
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Bias Bug localization Bug report Python Artificial Intelligence and Robotics Databases and Information Systems Information Security Programming Languages and Compilers
spellingShingle	Bias Bug localization Bug report Python Artificial Intelligence and Robotics Databases and Information Systems Information Security Programming Languages and Compilers WIDYASARI, Ratnadira HARYONO, Stefanus Agus THUNG, Ferdian SHI, Jieke TAN, Constance WEE, Fiona PHAN, Jack David LO, On the influence of biases in bug localization: evaluation and benchmark
description	Bug localization is the task of identifying parts of thesource code that needs to be changed to resolve a bug report.As this task is difficult, automatic bug localization tools havebeen proposed. The development and evaluation of these toolsrely on the availability of high-quality bug report datasets. In2014, Kochhar et al. identified three biases in datasets used toevaluate bug localization techniques: (1) misclassified bug report,(2) already localized bug report, and (3) incorrect ground truthfile in a bug report. They reported that already localized bugreports statistically significantly and substantially impact buglocalization results, and thus should be removed. However, theirevaluation is still limited, as they only investigated 3 projectswritten in Java. In this study, we replicate the study of Kochharet al. on the effect of biases in bug report dataset for buglocalization. Further investigation on this topic is necessary asnew and larger bug report datasets have been proposed withoutbeing checked for these biases.We conduct our analysis on a collection of 2,913 bug reportstaken from the recently released Bugzbook dataset that fix Pythonfiles. To investigate the prevalence of the biases, we check thebias distributions. For each bias, we select and label a set of bugreports that may contain the bias and compute the proportionof bug reports in the set that exhibit the bias. We find that5%, 23%, and 30% of the bug reports that we investigated areaffected by biases 1, 2, and 3 respectively. Then, we investigatethe effect of the three biases on bug localization by measuringthe performance of IncBL, a recent bug localization tool, andthe classical Vector Space Model (VSM) based bug localizationtool, which was used in the Kochhar et al. study. Our experiment results highlight that bias 2 significantly impact the buglocalization results, while bias 1 and 3 do not have a significantimpact. We also find that the effect sizes of bias 2 to IncBL andVSM are different, where IncBL has a higher effect size thanVSM. Our findings corroborate the result reported by Kochharet al. and demonstrate that bias 2 not only affects the 3 Javaprojects investigated in their study, but also others in anotherprogramming language (i.e., Python). This highlights the need toeliminate bias 2 from the evaluation of future bug localizationtools. As a by-product of our replication study, we have releaseda benchmark dataset, which we refer to as CAPTURED, that hasbeen cleaned from the three biases. CAPTURED contains Pythonprograms and therefore augments the cleaned dataset releasedby Kochhar et al., which only contains Java programs.
format	text
author	WIDYASARI, Ratnadira HARYONO, Stefanus Agus THUNG, Ferdian SHI, Jieke TAN, Constance WEE, Fiona PHAN, Jack David LO,
author_facet	WIDYASARI, Ratnadira HARYONO, Stefanus Agus THUNG, Ferdian SHI, Jieke TAN, Constance WEE, Fiona PHAN, Jack David LO,
author_sort	WIDYASARI, Ratnadira
title	On the influence of biases in bug localization: evaluation and benchmark
title_short	On the influence of biases in bug localization: evaluation and benchmark
title_full	On the influence of biases in bug localization: evaluation and benchmark
title_fullStr	On the influence of biases in bug localization: evaluation and benchmark
title_full_unstemmed	On the influence of biases in bug localization: evaluation and benchmark
title_sort	on the influence of biases in bug localization: evaluation and benchmark
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/7655 https://ink.library.smu.edu.sg/context/sis_research/article/8658/viewcontent/On_the_influence_of_biases_in_bug_localization_evaluation_and_benchmark.pdf
_version_	1770576399168438272

On the influence of biases in bug localization: evaluation and benchmark

Similar Items