Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction

Effort-aware just-in-time (JIT) defect prediction aims at finding more defective software changes with limited code inspection cost. Traditionally, supervised models have been used; however, they require sufficient labelled training data, which is difficult to obtain, especially for new projects. Re...

Full description

Saved in:

Bibliographic Details
Main Authors:	HUANG, Qiao, XIA, Xin, LO, David
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2018
Subjects:	Evaluation metrics Defect prediction Research bias Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/4355 https://ink.library.smu.edu.sg/context/sis_research/article/5358/viewcontent/Revisting_effort_aware_JIT_DP_emse18_afv.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5358
record_format	dspace
spelling	sg-smu-ink.sis_research-53582019-06-13T10:01:43Z Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction HUANG, Qiao XIA, Xin LO, David Effort-aware just-in-time (JIT) defect prediction aims at finding more defective software changes with limited code inspection cost. Traditionally, supervised models have been used; however, they require sufficient labelled training data, which is difficult to obtain, especially for new projects. Recently, Yang et al. proposed an unsupervised model (i.e., LT) and applied it to projects with rich historical bug data. Interestingly, they reported that, under the same inspection cost (i.e., 20 percent of the total lines of code modified by all changes), it could find about 12% - 27% more defective changes than a state-of-the-art supervised model (i.e., EALR) when using different evaluation settings. This is surprising as supervised models that benefit from historical data are expected to perform better than unsupervised ones. Their finding suggests that previous studies on defect prediction had made a simple problem too complex. Considering the potential high impact of Yang et al.’s work, in this paper, we perform a replication study and present the following new findings: (1) Under the same inspection budget, LT requires developers to inspect a large number of changes necessitating many more context switches. (2) Although LT finds more defective changes, many highly ranked changes are false alarms. These initial false alarms may negatively impact practitioners’ patience and confidence. (3) LT does not outperform EALR when the harmonic mean of Recall and Precision (i.e., F1-score) is considered. Aside from highlighting the above findings, we propose a simple but improved supervised model called CBS+, which leverages the idea of both EALR and LT. We investigate the performance of CBS+ using three different evaluation settings, including time-wise cross-validation, 10-times 10-fold cross-validation and cross-project validation. When compared with EALR, CBS+ detects about 15% - 26% more defective changes, while keeping the number of context switches and initial false alarms close to those of EALR. When compared with LT, the number of defective changes detected by CBS+ is comparable to LT’s result, while CBS+ significantly reduces context switches and initial false alarms before first success. Finally, we discuss how to balance the tradeoff between the number of inspected defects and context switches, and present the implications of our findings for practitioners and researchers. 2018-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4355 info:doi/10.1007/s10664-018-9661-2 https://ink.library.smu.edu.sg/context/sis_research/article/5358/viewcontent/Revisting_effort_aware_JIT_DP_emse18_afv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Evaluation metrics Defect prediction Research bias Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Evaluation metrics Defect prediction Research bias Software Engineering
spellingShingle	Evaluation metrics Defect prediction Research bias Software Engineering HUANG, Qiao XIA, Xin LO, David Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction
description	Effort-aware just-in-time (JIT) defect prediction aims at finding more defective software changes with limited code inspection cost. Traditionally, supervised models have been used; however, they require sufficient labelled training data, which is difficult to obtain, especially for new projects. Recently, Yang et al. proposed an unsupervised model (i.e., LT) and applied it to projects with rich historical bug data. Interestingly, they reported that, under the same inspection cost (i.e., 20 percent of the total lines of code modified by all changes), it could find about 12% - 27% more defective changes than a state-of-the-art supervised model (i.e., EALR) when using different evaluation settings. This is surprising as supervised models that benefit from historical data are expected to perform better than unsupervised ones. Their finding suggests that previous studies on defect prediction had made a simple problem too complex. Considering the potential high impact of Yang et al.’s work, in this paper, we perform a replication study and present the following new findings: (1) Under the same inspection budget, LT requires developers to inspect a large number of changes necessitating many more context switches. (2) Although LT finds more defective changes, many highly ranked changes are false alarms. These initial false alarms may negatively impact practitioners’ patience and confidence. (3) LT does not outperform EALR when the harmonic mean of Recall and Precision (i.e., F1-score) is considered. Aside from highlighting the above findings, we propose a simple but improved supervised model called CBS+, which leverages the idea of both EALR and LT. We investigate the performance of CBS+ using three different evaluation settings, including time-wise cross-validation, 10-times 10-fold cross-validation and cross-project validation. When compared with EALR, CBS+ detects about 15% - 26% more defective changes, while keeping the number of context switches and initial false alarms close to those of EALR. When compared with LT, the number of defective changes detected by CBS+ is comparable to LT’s result, while CBS+ significantly reduces context switches and initial false alarms before first success. Finally, we discuss how to balance the tradeoff between the number of inspected defects and context switches, and present the implications of our findings for practitioners and researchers.
format	text
author	HUANG, Qiao XIA, Xin LO, David
author_facet	HUANG, Qiao XIA, Xin LO, David
author_sort	HUANG, Qiao
title	Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction
title_short	Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction
title_full	Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction
title_fullStr	Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction
title_full_unstemmed	Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction
title_sort	revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction
publisher	Institutional Knowledge at Singapore Management University
publishDate	2018
url	https://ink.library.smu.edu.sg/sis_research/4355 https://ink.library.smu.edu.sg/context/sis_research/article/5358/viewcontent/Revisting_effort_aware_JIT_DP_emse18_afv.pdf
_version_	1770574685102145536

Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction

Similar Items