Software defect prediction framework based on hybrid metaheuristic optimization methods

A software defect is an error, failure, or fault in a software that produces an incorrect or unexpected result. Software defects are expensive in quality and cost. The accurate prediction of defect‐prone software modules certainly assist testing effort, reduce costs and improve the quality of softwa...

Full description

Saved in:
Bibliographic Details
Main Author: Wahono, Romi Satria
Format: Thesis
Language:English
English
Published: 2015
Subjects:
Online Access:http://eprints.utem.edu.my/id/eprint/16874/1/Software%20Defect%20Prediction%20Framework%20Based%20On%20Hybrid%20Metaheuristic%20Optimization%20Methods.pdf
http://eprints.utem.edu.my/id/eprint/16874/2/Software%20defect%20prediction%20framework%20based%20on%20hybrid%20metaheuristic%20optimization%20methods.pdf
http://eprints.utem.edu.my/id/eprint/16874/
https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=96192
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknikal Malaysia Melaka
Language: English
English
id my.utem.eprints.16874
record_format eprints
spelling my.utem.eprints.168742022-06-02T10:39:31Z http://eprints.utem.edu.my/id/eprint/16874/ Software defect prediction framework based on hybrid metaheuristic optimization methods Wahono, Romi Satria Q Science (General) QA Mathematics A software defect is an error, failure, or fault in a software that produces an incorrect or unexpected result. Software defects are expensive in quality and cost. The accurate prediction of defect‐prone software modules certainly assist testing effort, reduce costs and improve the quality of software. The classification algorithm is a popular machine learning approach for software defect prediction. Unfortunately, software defect prediction remains a largely unsolved problem. As the first problem, the comparison and benchmarking results of the defect prediction using machine learning classifiers indicate that, the poor accuracy level is dominant and no particular classifiers perform best for all the datasets. There are two main problems that affect classification performance in software defect prediction: noisy attributes and imbalanced class distribution of datasets, and difficulty of selecting optimal parameters of the classifiers. In this study, a software defect prediction framework that combines metaheuristic optimization methods for feature selection and parameter optimization, with meta learning methods for solving imbalanced class problem on datasets, which aims to improve the accuracy of classification models has been proposed. The proposed framework and models that are are considered to be the specific research contributions of this thesis are: 1) a comparison framework of classification models for software defect prediction known as CF-SDP, 2) a hybrid genetic algorithm based feature selection and bagging technique for software defect prediction known as GAFS+B, 3) a hybrid particle swarm optimization based feature selection and bagging technique for software defect prediction known as PSOFS+B, and 4) a hybrid genetic algorithm based neural network parameter optimization and bagging technique for software defect prediction, known as NN-GAPO+B. For the purpose of this study, ten classification algorithms have been selected. The selection aims at achieving a balance between established classification algorithms used in software defect prediction. The proposed framework and methods are evaluated using the state-of-the-art datasets from the NASA metric data repository. The results indicated that the proposed methods (GAFS+B, PSOFS+B and NN-GAPO+B) makes an impressive improvement in the performance of software defect prediction. GAFS+B and PSOFS+B significantly affected on the performance of the class imbalance suffered classifiers, such as C4.5 and CART. GAFS+B and PSOFS+B also outperformed the existing software defect prediction frameworks in most datasets. Based on the conducted experiments, logistic regression performs best in most of the NASA MDP datasets, without or with feature selection method. The proposed methods also generated the selected relevant features in software defect prediction. The top ten most relevant features in software defect prediction include branch count metrics, decision density, halstead level metric of a module, number of operands contained in a module, maintenance severity, number of blank LOC, halstead volume, number of unique operands contained in a module, total number of LOC and design density. 2015 Thesis NonPeerReviewed text en http://eprints.utem.edu.my/id/eprint/16874/1/Software%20Defect%20Prediction%20Framework%20Based%20On%20Hybrid%20Metaheuristic%20Optimization%20Methods.pdf text en http://eprints.utem.edu.my/id/eprint/16874/2/Software%20defect%20prediction%20framework%20based%20on%20hybrid%20metaheuristic%20optimization%20methods.pdf Wahono, Romi Satria (2015) Software defect prediction framework based on hybrid metaheuristic optimization methods. Doctoral thesis, Universiti Teknikal Malaysia Melaka. https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=96192
institution Universiti Teknikal Malaysia Melaka
building UTEM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknikal Malaysia Melaka
content_source UTEM Institutional Repository
url_provider http://eprints.utem.edu.my/
language English
English
topic Q Science (General)
QA Mathematics
spellingShingle Q Science (General)
QA Mathematics
Wahono, Romi Satria
Software defect prediction framework based on hybrid metaheuristic optimization methods
description A software defect is an error, failure, or fault in a software that produces an incorrect or unexpected result. Software defects are expensive in quality and cost. The accurate prediction of defect‐prone software modules certainly assist testing effort, reduce costs and improve the quality of software. The classification algorithm is a popular machine learning approach for software defect prediction. Unfortunately, software defect prediction remains a largely unsolved problem. As the first problem, the comparison and benchmarking results of the defect prediction using machine learning classifiers indicate that, the poor accuracy level is dominant and no particular classifiers perform best for all the datasets. There are two main problems that affect classification performance in software defect prediction: noisy attributes and imbalanced class distribution of datasets, and difficulty of selecting optimal parameters of the classifiers. In this study, a software defect prediction framework that combines metaheuristic optimization methods for feature selection and parameter optimization, with meta learning methods for solving imbalanced class problem on datasets, which aims to improve the accuracy of classification models has been proposed. The proposed framework and models that are are considered to be the specific research contributions of this thesis are: 1) a comparison framework of classification models for software defect prediction known as CF-SDP, 2) a hybrid genetic algorithm based feature selection and bagging technique for software defect prediction known as GAFS+B, 3) a hybrid particle swarm optimization based feature selection and bagging technique for software defect prediction known as PSOFS+B, and 4) a hybrid genetic algorithm based neural network parameter optimization and bagging technique for software defect prediction, known as NN-GAPO+B. For the purpose of this study, ten classification algorithms have been selected. The selection aims at achieving a balance between established classification algorithms used in software defect prediction. The proposed framework and methods are evaluated using the state-of-the-art datasets from the NASA metric data repository. The results indicated that the proposed methods (GAFS+B, PSOFS+B and NN-GAPO+B) makes an impressive improvement in the performance of software defect prediction. GAFS+B and PSOFS+B significantly affected on the performance of the class imbalance suffered classifiers, such as C4.5 and CART. GAFS+B and PSOFS+B also outperformed the existing software defect prediction frameworks in most datasets. Based on the conducted experiments, logistic regression performs best in most of the NASA MDP datasets, without or with feature selection method. The proposed methods also generated the selected relevant features in software defect prediction. The top ten most relevant features in software defect prediction include branch count metrics, decision density, halstead level metric of a module, number of operands contained in a module, maintenance severity, number of blank LOC, halstead volume, number of unique operands contained in a module, total number of LOC and design density.
format Thesis
author Wahono, Romi Satria
author_facet Wahono, Romi Satria
author_sort Wahono, Romi Satria
title Software defect prediction framework based on hybrid metaheuristic optimization methods
title_short Software defect prediction framework based on hybrid metaheuristic optimization methods
title_full Software defect prediction framework based on hybrid metaheuristic optimization methods
title_fullStr Software defect prediction framework based on hybrid metaheuristic optimization methods
title_full_unstemmed Software defect prediction framework based on hybrid metaheuristic optimization methods
title_sort software defect prediction framework based on hybrid metaheuristic optimization methods
publishDate 2015
url http://eprints.utem.edu.my/id/eprint/16874/1/Software%20Defect%20Prediction%20Framework%20Based%20On%20Hybrid%20Metaheuristic%20Optimization%20Methods.pdf
http://eprints.utem.edu.my/id/eprint/16874/2/Software%20defect%20prediction%20framework%20based%20on%20hybrid%20metaheuristic%20optimization%20methods.pdf
http://eprints.utem.edu.my/id/eprint/16874/
https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=96192
_version_ 1735390156779683840