File-level defect prediction: Unsupervised vs. supervised models

Background: Software defect models can help software quality assurance teams to allocate testing or code review resources. A variety of techniques have been used to build defect prediction models, including supervised and unsupervised methods. Recently, Yang et al. [1] surprisingly find that unsuper...

Full description

Saved in:

Bibliographic Details
Main Authors:	YAN, Meng, FANG, Yicheng, LO, David, XIA, Xin, ZHANG, Xiaohong
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2017
Subjects:	Software Engineering Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/3923 https://ink.library.smu.edu.sg/context/sis_research/article/4925/viewcontent/esem17.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-4925
record_format	dspace
spelling	sg-smu-ink.sis_research-49252020-07-22T07:41:21Z File-level defect prediction: Unsupervised vs. supervised models YAN, Meng FANG, Yicheng LO, David XIA, Xin ZHANG, Xiaohong Background: Software defect models can help software quality assurance teams to allocate testing or code review resources. A variety of techniques have been used to build defect prediction models, including supervised and unsupervised methods. Recently, Yang et al. [1] surprisingly find that unsupervised models can perform statistically significantly better than supervised models in effort-aware change-level defect prediction. However, little is known about relative performance of unsupervised and supervised models for effort-aware file-level defect prediction. Goal: Inspired by their work, we aim to investigate whether a similar finding holds in effort-aware file-level defect prediction. Method: We replicate Yang et al.'s study on PROMISE dataset with totally ten projects. We compare the effectiveness of unsupervised and supervised prediction models for effort-aware file-level defect prediction. Results: We find that the conclusion of Yang et al. [1] does not hold under within-project but holds under cross-project setting for file-level defect prediction. In addition, following the recommendations given by the best unsupervised model, developers needs to inspect statistically significantly more files than that of supervised models considering the same inspection effort (i.e., LOC). Conclusions: (a) Unsupervised models do not perform statistically significantly better than state-of-art supervised model under within-project setting, (b) Unsupervised models can perform statistically significantly better than state-ofart supervised model under cross-project setting, (c) We suggest that not only LOC but also number of files needed to be inspected should be considered when evaluating effort-aware filelevel defect prediction models. 2017-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3923 info:doi/10.1109/ESEM.2017.48 https://ink.library.smu.edu.sg/context/sis_research/article/4925/viewcontent/esem17.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Software Engineering Theory and Algorithms
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Software Engineering Theory and Algorithms
spellingShingle	Software Engineering Theory and Algorithms YAN, Meng FANG, Yicheng LO, David XIA, Xin ZHANG, Xiaohong File-level defect prediction: Unsupervised vs. supervised models
description	Background: Software defect models can help software quality assurance teams to allocate testing or code review resources. A variety of techniques have been used to build defect prediction models, including supervised and unsupervised methods. Recently, Yang et al. [1] surprisingly find that unsupervised models can perform statistically significantly better than supervised models in effort-aware change-level defect prediction. However, little is known about relative performance of unsupervised and supervised models for effort-aware file-level defect prediction. Goal: Inspired by their work, we aim to investigate whether a similar finding holds in effort-aware file-level defect prediction. Method: We replicate Yang et al.'s study on PROMISE dataset with totally ten projects. We compare the effectiveness of unsupervised and supervised prediction models for effort-aware file-level defect prediction. Results: We find that the conclusion of Yang et al. [1] does not hold under within-project but holds under cross-project setting for file-level defect prediction. In addition, following the recommendations given by the best unsupervised model, developers needs to inspect statistically significantly more files than that of supervised models considering the same inspection effort (i.e., LOC). Conclusions: (a) Unsupervised models do not perform statistically significantly better than state-of-art supervised model under within-project setting, (b) Unsupervised models can perform statistically significantly better than state-ofart supervised model under cross-project setting, (c) We suggest that not only LOC but also number of files needed to be inspected should be considered when evaluating effort-aware filelevel defect prediction models.
format	text
author	YAN, Meng FANG, Yicheng LO, David XIA, Xin ZHANG, Xiaohong
author_facet	YAN, Meng FANG, Yicheng LO, David XIA, Xin ZHANG, Xiaohong
author_sort	YAN, Meng
title	File-level defect prediction: Unsupervised vs. supervised models
title_short	File-level defect prediction: Unsupervised vs. supervised models
title_full	File-level defect prediction: Unsupervised vs. supervised models
title_fullStr	File-level defect prediction: Unsupervised vs. supervised models
title_full_unstemmed	File-level defect prediction: Unsupervised vs. supervised models
title_sort	file-level defect prediction: unsupervised vs. supervised models
publisher	Institutional Knowledge at Singapore Management University
publishDate	2017
url	https://ink.library.smu.edu.sg/sis_research/3923 https://ink.library.smu.edu.sg/context/sis_research/article/4925/viewcontent/esem17.pdf
_version_	1770573936614965248

File-level defect prediction: Unsupervised vs. supervised models

Similar Items