An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction

Software testing is a crucial task during software development process with the potential to save time and budget by recognizing defects as early as possible and delivering a more defect-free product. To improve the testing process, fault prediction approaches identify parts of the system that are m...

Full description

Saved in:
Bibliographic Details
Main Authors: Abaei, Golnoush, Selamat, Ali, Fujita, Hamido
Format: Article
Published: Elsevier 2015
Subjects:
Online Access:http://eprints.utm.my/id/eprint/57752/
http://dx.doi.org/ 10.1016/j.knosys.2014.10.017
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
Description
Summary:Software testing is a crucial task during software development process with the potential to save time and budget by recognizing defects as early as possible and delivering a more defect-free product. To improve the testing process, fault prediction approaches identify parts of the system that are more defect prone. However, when the defect data or quality-based class labels are not identified or the company does not have similar or earlier versions of the software project, researchers cannot use supervised classification methods for defect detection. In order to detect defect proneness of modules in software projects with high accuracy and improve detection model generalization ability, we propose an automated software fault detection model using semi-supervised hybrid self-organizing map (HySOM). HySOM is a semi-supervised model based on self-organizing map and artificial neural network. The advantage of HySOM is the ability to predict the label of the modules in a semi-supervised manner using software measurement threshold values in the absence of quality data. In semi-supervised HySOM, the role of expert for identifying fault prone modules becomes less critical and more supportive. We have benchmarked the proposed model with eight industrial data sets from NASA and Turkish white-goods embedded controller software. The results show improvement in false negative rate and overall error rate in 80% and 60% of the cases respectively for NASA data sets. Moreover, we investigate the performance of the proposed model with other recent proposed methods. According to the results, our semi-supervised model can be used as an automated tool to guide testing effort by prioritizing the module's defects improving the quality of software development and software testing in less time and budget.