The impact of changes mislabeled by SZZ on just-in-time defect prediction

Just-in-Time (JIT) defect prediction—a technique which aims to predict bugs at change level—has been paid more attention. JIT defect prediction leverages the SZZ approach to identify bug-introducing changes. Recently, researchers found that the performance of SZZ (including its variants) is impacted...

Full description

Saved in:

Bibliographic Details
Main Authors:	FAN, Yuanrui, XIA, Xin, COSTA, Daniel A., LO, David, HASSAN, Ahmed E., LI, Shanping
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2019
Subjects:	Just-in-Time Defect Prediction SZZ Noisy Data Mining Software Repositories Data Storage Systems Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/4494 https://ink.library.smu.edu.sg/context/sis_research/article/5497/viewcontent/tse194.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5497
record_format	dspace
spelling	sg-smu-ink.sis_research-54972019-12-19T06:29:36Z The impact of changes mislabeled by SZZ on just-in-time defect prediction FAN, Yuanrui XIA, Xin COSTA, Daniel A. LO, David HASSAN, Ahmed E. LI, Shanping Just-in-Time (JIT) defect prediction—a technique which aims to predict bugs at change level—has been paid more attention. JIT defect prediction leverages the SZZ approach to identify bug-introducing changes. Recently, researchers found that the performance of SZZ (including its variants) is impacted by a large amount of noise. SZZ may considerably mislabel changes that are used to train a JIT defect prediction model, and thus impact the prediction accuracy. In this paper, we investigate the impact of the mislabeled changes by different SZZ variants on the performance and interpretation of JIT defect prediction models. We analyze four SZZ variants (i.e., B-SZZ, AG-SZZ, MA-SZZ, and RA-SZZ) that are proposed by prior studies. We build the prediction models using the labeled data by these four SZZ variants. Among the four SZZ variants, RA-SZZ is least likely to generate mislabeled changes, and we construct the testing set by using RA-SZZ. All of the four prediction models are then evaluated on the same testing set. We choose the prediction model built on the labeled data by RA-SZZ as the baseline model, and we compare the performance and metric importance of the models trained using the labeled data by the other three SZZ variants with the baseline model. Through a large-scale empirical study on a total of 126,526 changes from ten Apache open source projects, we find that in terms of various performance measures (AUC, F1-score, G-mean and Recall@20%), the mislabeled changes by B-SZZ and MA-SZZ are not likely to cause a considerable performance reduction, while the mislabeled changes by AG-SZZ cause a statistically significant performance reduction with an average difference of 1%–5%. When considering developers’ inspection effort (measured by LOC) in practice, the changes mislabeled B-SZZ and AG-SZZ lead to 9%–10% and 1%–15% more wasted inspection effort, respectively. And the mislabeled changes by B-SZZ lead to significantly more wasted effort. The mislabeled changes by MA-SZZ do not cause considerably more wasted effort. We also find that the top-most important metric for identifying bug-introducing changes (i.e., number of files modified in a change) is robust to the mislabeling noise generated by SZZ. But the second- and third-most important metrics are more likely to be impacted by the mislabeling noise, unless random forest is used as the underlying classifier 2019-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4494 info:doi/10.1109/TSE.2019.2929761 https://ink.library.smu.edu.sg/context/sis_research/article/5497/viewcontent/tse194.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Just-in-Time Defect Prediction SZZ Noisy Data Mining Software Repositories Data Storage Systems Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Just-in-Time Defect Prediction SZZ Noisy Data Mining Software Repositories Data Storage Systems Software Engineering
spellingShingle	Just-in-Time Defect Prediction SZZ Noisy Data Mining Software Repositories Data Storage Systems Software Engineering FAN, Yuanrui XIA, Xin COSTA, Daniel A. LO, David HASSAN, Ahmed E. LI, Shanping The impact of changes mislabeled by SZZ on just-in-time defect prediction
description	Just-in-Time (JIT) defect prediction—a technique which aims to predict bugs at change level—has been paid more attention. JIT defect prediction leverages the SZZ approach to identify bug-introducing changes. Recently, researchers found that the performance of SZZ (including its variants) is impacted by a large amount of noise. SZZ may considerably mislabel changes that are used to train a JIT defect prediction model, and thus impact the prediction accuracy. In this paper, we investigate the impact of the mislabeled changes by different SZZ variants on the performance and interpretation of JIT defect prediction models. We analyze four SZZ variants (i.e., B-SZZ, AG-SZZ, MA-SZZ, and RA-SZZ) that are proposed by prior studies. We build the prediction models using the labeled data by these four SZZ variants. Among the four SZZ variants, RA-SZZ is least likely to generate mislabeled changes, and we construct the testing set by using RA-SZZ. All of the four prediction models are then evaluated on the same testing set. We choose the prediction model built on the labeled data by RA-SZZ as the baseline model, and we compare the performance and metric importance of the models trained using the labeled data by the other three SZZ variants with the baseline model. Through a large-scale empirical study on a total of 126,526 changes from ten Apache open source projects, we find that in terms of various performance measures (AUC, F1-score, G-mean and Recall@20%), the mislabeled changes by B-SZZ and MA-SZZ are not likely to cause a considerable performance reduction, while the mislabeled changes by AG-SZZ cause a statistically significant performance reduction with an average difference of 1%–5%. When considering developers’ inspection effort (measured by LOC) in practice, the changes mislabeled B-SZZ and AG-SZZ lead to 9%–10% and 1%–15% more wasted inspection effort, respectively. And the mislabeled changes by B-SZZ lead to significantly more wasted effort. The mislabeled changes by MA-SZZ do not cause considerably more wasted effort. We also find that the top-most important metric for identifying bug-introducing changes (i.e., number of files modified in a change) is robust to the mislabeling noise generated by SZZ. But the second- and third-most important metrics are more likely to be impacted by the mislabeling noise, unless random forest is used as the underlying classifier
format	text
author	FAN, Yuanrui XIA, Xin COSTA, Daniel A. LO, David HASSAN, Ahmed E. LI, Shanping
author_facet	FAN, Yuanrui XIA, Xin COSTA, Daniel A. LO, David HASSAN, Ahmed E. LI, Shanping
author_sort	FAN, Yuanrui
title	The impact of changes mislabeled by SZZ on just-in-time defect prediction
title_short	The impact of changes mislabeled by SZZ on just-in-time defect prediction
title_full	The impact of changes mislabeled by SZZ on just-in-time defect prediction
title_fullStr	The impact of changes mislabeled by SZZ on just-in-time defect prediction
title_full_unstemmed	The impact of changes mislabeled by SZZ on just-in-time defect prediction
title_sort	impact of changes mislabeled by szz on just-in-time defect prediction
publisher	Institutional Knowledge at Singapore Management University
publishDate	2019
url	https://ink.library.smu.edu.sg/sis_research/4494 https://ink.library.smu.edu.sg/context/sis_research/article/5497/viewcontent/tse194.pdf
_version_	1770574875082096640

The impact of changes mislabeled by SZZ on just-in-time defect prediction

Similar Items