File-level defect prediction: Unsupervised vs. supervised models
Background: Software defect models can help software quality assurance teams to allocate testing or code review resources. A variety of techniques have been used to build defect prediction models, including supervised and unsupervised methods. Recently, Yang et al. [1] surprisingly find that unsuper...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2017
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/3923 https://ink.library.smu.edu.sg/context/sis_research/article/4925/viewcontent/esem17.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-4925 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-49252020-07-22T07:41:21Z File-level defect prediction: Unsupervised vs. supervised models YAN, Meng FANG, Yicheng LO, David XIA, Xin ZHANG, Xiaohong Background: Software defect models can help software quality assurance teams to allocate testing or code review resources. A variety of techniques have been used to build defect prediction models, including supervised and unsupervised methods. Recently, Yang et al. [1] surprisingly find that unsupervised models can perform statistically significantly better than supervised models in effort-aware change-level defect prediction. However, little is known about relative performance of unsupervised and supervised models for effort-aware file-level defect prediction. Goal: Inspired by their work, we aim to investigate whether a similar finding holds in effort-aware file-level defect prediction. Method: We replicate Yang et al.'s study on PROMISE dataset with totally ten projects. We compare the effectiveness of unsupervised and supervised prediction models for effort-aware file-level defect prediction. Results: We find that the conclusion of Yang et al. [1] does not hold under within-project but holds under cross-project setting for file-level defect prediction. In addition, following the recommendations given by the best unsupervised model, developers needs to inspect statistically significantly more files than that of supervised models considering the same inspection effort (i.e., LOC). Conclusions: (a) Unsupervised models do not perform statistically significantly better than state-of-art supervised model under within-project setting, (b) Unsupervised models can perform statistically significantly better than state-ofart supervised model under cross-project setting, (c) We suggest that not only LOC but also number of files needed to be inspected should be considered when evaluating effort-aware filelevel defect prediction models. 2017-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3923 info:doi/10.1109/ESEM.2017.48 https://ink.library.smu.edu.sg/context/sis_research/article/4925/viewcontent/esem17.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Software Engineering Theory and Algorithms |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Software Engineering Theory and Algorithms |
spellingShingle |
Software Engineering Theory and Algorithms YAN, Meng FANG, Yicheng LO, David XIA, Xin ZHANG, Xiaohong File-level defect prediction: Unsupervised vs. supervised models |
description |
Background: Software defect models can help software quality assurance teams to allocate testing or code review resources. A variety of techniques have been used to build defect prediction models, including supervised and unsupervised methods. Recently, Yang et al. [1] surprisingly find that unsupervised models can perform statistically significantly better than supervised models in effort-aware change-level defect prediction. However, little is known about relative performance of unsupervised and supervised models for effort-aware file-level defect prediction. Goal: Inspired by their work, we aim to investigate whether a similar finding holds in effort-aware file-level defect prediction. Method: We replicate Yang et al.'s study on PROMISE dataset with totally ten projects. We compare the effectiveness of unsupervised and supervised prediction models for effort-aware file-level defect prediction. Results: We find that the conclusion of Yang et al. [1] does not hold under within-project but holds under cross-project setting for file-level defect prediction. In addition, following the recommendations given by the best unsupervised model, developers needs to inspect statistically significantly more files than that of supervised models considering the same inspection effort (i.e., LOC). Conclusions: (a) Unsupervised models do not perform statistically significantly better than state-of-art supervised model under within-project setting, (b) Unsupervised models can perform statistically significantly better than state-ofart supervised model under cross-project setting, (c) We suggest that not only LOC but also number of files needed to be inspected should be considered when evaluating effort-aware filelevel defect prediction models. |
format |
text |
author |
YAN, Meng FANG, Yicheng LO, David XIA, Xin ZHANG, Xiaohong |
author_facet |
YAN, Meng FANG, Yicheng LO, David XIA, Xin ZHANG, Xiaohong |
author_sort |
YAN, Meng |
title |
File-level defect prediction: Unsupervised vs. supervised models |
title_short |
File-level defect prediction: Unsupervised vs. supervised models |
title_full |
File-level defect prediction: Unsupervised vs. supervised models |
title_fullStr |
File-level defect prediction: Unsupervised vs. supervised models |
title_full_unstemmed |
File-level defect prediction: Unsupervised vs. supervised models |
title_sort |
file-level defect prediction: unsupervised vs. supervised models |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2017 |
url |
https://ink.library.smu.edu.sg/sis_research/3923 https://ink.library.smu.edu.sg/context/sis_research/article/4925/viewcontent/esem17.pdf |
_version_ |
1770573936614965248 |