Software defect prediction framework based on hybrid metaheuristic optimization methods
A software defect is an error, failure, or fault in a software that produces an incorrect or unexpected result. Software defects are expensive in quality and cost. The accurate prediction of defect‐prone software modules certainly assist testing effort, reduce costs and improve the quality of softwa...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English |
Published: |
2015
|
Subjects: | |
Online Access: | http://eprints.utem.edu.my/id/eprint/16874/1/Software%20Defect%20Prediction%20Framework%20Based%20On%20Hybrid%20Metaheuristic%20Optimization%20Methods.pdf http://eprints.utem.edu.my/id/eprint/16874/2/Software%20defect%20prediction%20framework%20based%20on%20hybrid%20metaheuristic%20optimization%20methods.pdf http://eprints.utem.edu.my/id/eprint/16874/ https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=96192 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknikal Malaysia Melaka |
Language: | English English |
id |
my.utem.eprints.16874 |
---|---|
record_format |
eprints |
spelling |
my.utem.eprints.168742022-06-02T10:39:31Z http://eprints.utem.edu.my/id/eprint/16874/ Software defect prediction framework based on hybrid metaheuristic optimization methods Wahono, Romi Satria Q Science (General) QA Mathematics A software defect is an error, failure, or fault in a software that produces an incorrect or unexpected result. Software defects are expensive in quality and cost. The accurate prediction of defect‐prone software modules certainly assist testing effort, reduce costs and improve the quality of software. The classification algorithm is a popular machine learning approach for software defect prediction. Unfortunately, software defect prediction remains a largely unsolved problem. As the first problem, the comparison and benchmarking results of the defect prediction using machine learning classifiers indicate that, the poor accuracy level is dominant and no particular classifiers perform best for all the datasets. There are two main problems that affect classification performance in software defect prediction: noisy attributes and imbalanced class distribution of datasets, and difficulty of selecting optimal parameters of the classifiers. In this study, a software defect prediction framework that combines metaheuristic optimization methods for feature selection and parameter optimization, with meta learning methods for solving imbalanced class problem on datasets, which aims to improve the accuracy of classification models has been proposed. The proposed framework and models that are are considered to be the specific research contributions of this thesis are: 1) a comparison framework of classification models for software defect prediction known as CF-SDP, 2) a hybrid genetic algorithm based feature selection and bagging technique for software defect prediction known as GAFS+B, 3) a hybrid particle swarm optimization based feature selection and bagging technique for software defect prediction known as PSOFS+B, and 4) a hybrid genetic algorithm based neural network parameter optimization and bagging technique for software defect prediction, known as NN-GAPO+B. For the purpose of this study, ten classification algorithms have been selected. The selection aims at achieving a balance between established classification algorithms used in software defect prediction. The proposed framework and methods are evaluated using the state-of-the-art datasets from the NASA metric data repository. The results indicated that the proposed methods (GAFS+B, PSOFS+B and NN-GAPO+B) makes an impressive improvement in the performance of software defect prediction. GAFS+B and PSOFS+B significantly affected on the performance of the class imbalance suffered classifiers, such as C4.5 and CART. GAFS+B and PSOFS+B also outperformed the existing software defect prediction frameworks in most datasets. Based on the conducted experiments, logistic regression performs best in most of the NASA MDP datasets, without or with feature selection method. The proposed methods also generated the selected relevant features in software defect prediction. The top ten most relevant features in software defect prediction include branch count metrics, decision density, halstead level metric of a module, number of operands contained in a module, maintenance severity, number of blank LOC, halstead volume, number of unique operands contained in a module, total number of LOC and design density. 2015 Thesis NonPeerReviewed text en http://eprints.utem.edu.my/id/eprint/16874/1/Software%20Defect%20Prediction%20Framework%20Based%20On%20Hybrid%20Metaheuristic%20Optimization%20Methods.pdf text en http://eprints.utem.edu.my/id/eprint/16874/2/Software%20defect%20prediction%20framework%20based%20on%20hybrid%20metaheuristic%20optimization%20methods.pdf Wahono, Romi Satria (2015) Software defect prediction framework based on hybrid metaheuristic optimization methods. Doctoral thesis, Universiti Teknikal Malaysia Melaka. https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=96192 |
institution |
Universiti Teknikal Malaysia Melaka |
building |
UTEM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknikal Malaysia Melaka |
content_source |
UTEM Institutional Repository |
url_provider |
http://eprints.utem.edu.my/ |
language |
English English |
topic |
Q Science (General) QA Mathematics |
spellingShingle |
Q Science (General) QA Mathematics Wahono, Romi Satria Software defect prediction framework based on hybrid metaheuristic optimization methods |
description |
A software defect is an error, failure, or fault in a software that produces an incorrect or unexpected result. Software defects are expensive in quality and cost. The accurate prediction of defect‐prone software modules certainly assist testing effort, reduce costs and improve the quality of software. The classification algorithm is a popular machine learning approach for software defect prediction. Unfortunately, software defect prediction remains a largely unsolved problem. As the first problem, the comparison and benchmarking results of the defect prediction using machine learning classifiers indicate that, the poor accuracy level is dominant and no particular classifiers perform best for all the datasets. There are two
main problems that affect classification performance in software defect prediction: noisy attributes and imbalanced class distribution of datasets, and difficulty of selecting optimal parameters of the classifiers. In this study, a software defect prediction framework that combines metaheuristic optimization methods for feature selection and parameter optimization, with meta learning methods for solving imbalanced class problem on datasets, which aims to improve the accuracy of classification models has been proposed. The proposed framework and models that are are considered to be the specific research contributions of this thesis are: 1) a comparison framework of classification models for software defect prediction known as CF-SDP, 2) a hybrid genetic algorithm based feature
selection and bagging technique for software defect prediction known as GAFS+B, 3) a hybrid particle swarm optimization based feature selection and bagging technique for software defect prediction known as PSOFS+B, and 4) a hybrid genetic algorithm based neural network parameter optimization and bagging technique for software defect prediction, known as NN-GAPO+B. For the purpose of this study, ten classification algorithms have been selected. The selection aims at achieving a balance between established classification algorithms used in software defect prediction. The proposed framework and methods are
evaluated using the state-of-the-art datasets from the NASA metric data repository. The results indicated that the proposed methods (GAFS+B, PSOFS+B and NN-GAPO+B) makes
an impressive improvement in the performance of software defect prediction. GAFS+B and PSOFS+B significantly affected on the performance of the class imbalance suffered
classifiers, such as C4.5 and CART. GAFS+B and PSOFS+B also outperformed the existing software defect prediction frameworks in most datasets. Based on the conducted
experiments, logistic regression performs best in most of the NASA MDP datasets, without or with feature selection method. The proposed methods also generated the selected relevant features in software defect prediction. The top ten most relevant features in software defect prediction include branch count metrics, decision density, halstead level metric of a module, number of operands contained in a module, maintenance severity, number of blank LOC, halstead volume, number of unique operands contained in a module, total number of LOC and design density. |
format |
Thesis |
author |
Wahono, Romi Satria |
author_facet |
Wahono, Romi Satria |
author_sort |
Wahono, Romi Satria |
title |
Software defect prediction framework based on hybrid metaheuristic optimization methods |
title_short |
Software defect prediction framework based on hybrid metaheuristic optimization methods |
title_full |
Software defect prediction framework based on hybrid metaheuristic optimization methods |
title_fullStr |
Software defect prediction framework based on hybrid metaheuristic optimization methods |
title_full_unstemmed |
Software defect prediction framework based on hybrid metaheuristic optimization methods |
title_sort |
software defect prediction framework based on hybrid metaheuristic optimization methods |
publishDate |
2015 |
url |
http://eprints.utem.edu.my/id/eprint/16874/1/Software%20Defect%20Prediction%20Framework%20Based%20On%20Hybrid%20Metaheuristic%20Optimization%20Methods.pdf http://eprints.utem.edu.my/id/eprint/16874/2/Software%20defect%20prediction%20framework%20based%20on%20hybrid%20metaheuristic%20optimization%20methods.pdf http://eprints.utem.edu.my/id/eprint/16874/ https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=96192 |
_version_ |
1735390156779683840 |