Combined classifier for cross-project defect prediction: An extended empirical study

To facilitate developers in effective allocation of their testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on the past history of classes, methods, or...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHANG, Yun, LO, David, XIA, Xin, SUN, Jianling
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2018
Subjects:	defect prediction;cross-project;classifier combination Software Engineering Systems Architecture
Online Access:	https://ink.library.smu.edu.sg/sis_research/4128 https://ink.library.smu.edu.sg/context/sis_research/article/5131/viewcontent/Combined_classifier_for_cross_project_defect_prediction.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5131
record_format	dspace
spelling	sg-smu-ink.sis_research-51312018-09-21T03:23:34Z Combined classifier for cross-project defect prediction: An extended empirical study ZHANG, Yun LO, David XIA, Xin SUN, Jianling To facilitate developers in effective allocation of their testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on the past history of classes, methods, or certain other code elements. These techniques are effective provided that a sufficient amount of data is available to train a prediction model. However, sufficient training data are rarely available for new software projects. To resolve this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, was proposed and is regarded as a new challenge in the area of defect prediction. Thus far, only a few cross-project defect prediction techniques have been proposed. To advance the state of the art, in this study, we investigated seven composite algorithms that integrate multiple machine learning classifiers to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we performed experiments on 10 open-source software systems from the PROMISE repository, which contain a total of 5,305 instances labeled as defective or clean. We compared the composite algorithms with the combined defect predictor where logistic regression is used as the meta classification algorithm (CODEP (Logistic) ), which is the most recent cross-project defect prediction algorithm in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experimental results show that several algorithms outperform CODEP (Logistic) : Maximum voting shows the best performance in terms of F-measure and its average F-measure is superior to that of CODEP (Logistic) by 36.88%. Bootstrap aggregation (Bagging (J48)) shows the best performance in terms of cost effectiveness and its average cost effectiveness is superior to that of CODEP (Logistic) by 15.34%. 2018-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4128 info:doi/10.1007/s11704-017-6015-y https://ink.library.smu.edu.sg/context/sis_research/article/5131/viewcontent/Combined_classifier_for_cross_project_defect_prediction.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University defect prediction;cross-project;classifier combination Software Engineering Systems Architecture
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	defect prediction;cross-project;classifier combination Software Engineering Systems Architecture
spellingShingle	defect prediction;cross-project;classifier combination Software Engineering Systems Architecture ZHANG, Yun LO, David XIA, Xin SUN, Jianling Combined classifier for cross-project defect prediction: An extended empirical study
description	To facilitate developers in effective allocation of their testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on the past history of classes, methods, or certain other code elements. These techniques are effective provided that a sufficient amount of data is available to train a prediction model. However, sufficient training data are rarely available for new software projects. To resolve this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, was proposed and is regarded as a new challenge in the area of defect prediction. Thus far, only a few cross-project defect prediction techniques have been proposed. To advance the state of the art, in this study, we investigated seven composite algorithms that integrate multiple machine learning classifiers to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we performed experiments on 10 open-source software systems from the PROMISE repository, which contain a total of 5,305 instances labeled as defective or clean. We compared the composite algorithms with the combined defect predictor where logistic regression is used as the meta classification algorithm (CODEP (Logistic) ), which is the most recent cross-project defect prediction algorithm in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experimental results show that several algorithms outperform CODEP (Logistic) : Maximum voting shows the best performance in terms of F-measure and its average F-measure is superior to that of CODEP (Logistic) by 36.88%. Bootstrap aggregation (Bagging (J48)) shows the best performance in terms of cost effectiveness and its average cost effectiveness is superior to that of CODEP (Logistic) by 15.34%.
format	text
author	ZHANG, Yun LO, David XIA, Xin SUN, Jianling
author_facet	ZHANG, Yun LO, David XIA, Xin SUN, Jianling
author_sort	ZHANG, Yun
title	Combined classifier for cross-project defect prediction: An extended empirical study
title_short	Combined classifier for cross-project defect prediction: An extended empirical study
title_full	Combined classifier for cross-project defect prediction: An extended empirical study
title_fullStr	Combined classifier for cross-project defect prediction: An extended empirical study
title_full_unstemmed	Combined classifier for cross-project defect prediction: An extended empirical study
title_sort	combined classifier for cross-project defect prediction: an extended empirical study
publisher	Institutional Knowledge at Singapore Management University
publishDate	2018
url	https://ink.library.smu.edu.sg/sis_research/4128 https://ink.library.smu.edu.sg/context/sis_research/article/5131/viewcontent/Combined_classifier_for_cross_project_defect_prediction.pdf
_version_	1770574345082503168

Combined classifier for cross-project defect prediction: An extended empirical study

Similar Items