Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix
Software defect prediction provides actionable outputs to software teams while contributing to industrial success. Therefore, predicting the number of defects in a new version of software at both the class and method levels is an important goal of defect prediction studies to assist software team...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2020
|
Subjects: | |
Online Access: | http://studentsrepo.um.edu.my/14571/2/Ebubegogu.pdf http://studentsrepo.um.edu.my/14571/1/Ebubeogu.pdf http://studentsrepo.um.edu.my/14571/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaya |
Summary: | Software defect prediction provides actionable outputs to software teams while contributing
to industrial success. Therefore, predicting the number of defects in a new version
of software at both the class and method levels is an important goal of defect prediction
studies to assist software teams in optimizing their test efforts towards improving software
quality. However, despite remarkable achievements in defect prediction, the quality of the
data applied in defect prediction studies has been a major concern, with related quality
issues leading to numerous contradictory findings in machine learning research. In addition,
a demonstrated approach for predicting the number of defects in a new software version is
lacking. Therefore, efforts are required to demonstrate how class- and method-level defect
prediction can be achieved for a new software version and to develop an approach for
preprocessing the highly imbalanced class- and method-level data available for software
defect prediction. To address these issues, first, a data preprocessing framework is proposed
to overcome some of the challenges associated with typical software datasets, for instance,
irrelevant and redundant features. A machine-learning-driven, supervised optimal decision
procedure is followed in the development of this data preprocessing framework, resulting
in a prime advantage of bias-free method- and class-level datasets. Second, a method of
predicting the number of software defects in an upcoming product release is proposed using
predictor variables derived from the defect acceleration observed based on the existing
software defects, namely, the defect density, defect velocity and defect introduction time. The number of defects in the current version of a software product is characterized by
this defect acceleration; hence, these derived predictor variables can be used to construct
regression models to predict the number of software defects in a new version. An experiment
conducted on 69 open-source ELFF Java projects, containing 131,034 classes
and 289,132 methods, as well as on the NASA datasets, which contain 10 different Java
and C++ projects with 22,838 classes, is reported. To evaluate the effectiveness of the
proposed framework for data preprocessing, the average classification performances of
six selected state-of-the-art classifiers before and after data preprocessing are investigated
and compared across multiple projects with data imbalances between the defective and
defect-free classes. For both the class and method levels, these selected state-of-the-art
classifiers, namely, naïve Bayes, logistic regression, neural network, K-nearest neighbors,
support vector machine and random forest classifiers, achieve noteworthy performance
when applied to preprocessed datasets. Moreover, for the ELFF projects, the results at the
class and method levels respectively show correlation coefficients of 61% and 60% for the
defect density, -11% and -4% for the defect introduction time, and 94% and 93% for the
defect velocity (consistent results are also obtained for the NASA datasets, as presented in
the results section). The proposed approach can serve as a blueprint for program testing to
enhance the effectiveness of software development activities.
|
---|