Ensemble filters with harmonize algorithm for optimal solutions in medical datasets

Explosive increases of features in high dimensional datasets remains a challenge for data analysis in various research fields, especially the medical diagnosis sector, as it may affects the treatment received by the patients. Besides data dimensionality, classifiers such as Support Vector Machine (S...

Full description

Saved in:
Bibliographic Details
Main Author: Tengku Ab. Hamid, Tengku Mazlin
Format: Thesis
Language:English
Published: 2021
Subjects:
Online Access:http://eprints.utm.my/102978/1/TengkuMazlinTengkuAbHamidMSC2021.pdf.pdf
http://eprints.utm.my/102978/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:150761
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
Language: English
Description
Summary:Explosive increases of features in high dimensional datasets remains a challenge for data analysis in various research fields, especially the medical diagnosis sector, as it may affects the treatment received by the patients. Besides data dimensionality, classifiers such as Support Vector Machine (SVM) still lacks consistency in achieving an optimal performance due to improper kernel parameter settings. Commonly, the filter algorithm is frequently used for selecting relevant features due to its simple ranking strategies. However, most independent filter algorithms do not consider the intercorrelation between features, where a less dependent feature is the leading cause of why some features render irrelevant. Consequently, an imbalance number of features that could degrade the classification accuracy was produced. This problem can be alleviated using ensemble feature selection approach to identify the appropriate number of features by considering features dependency. In this study, an ensemble filters feature selection with harmonize classification algorithm has been proposed. The ensemble filters using Information Gain, Gain Ratio, Chi-squared and Relief-F are utilized with occurrence rate evaluation to identify the initial top-ranked features relevant for classification. A harmonize classification method is implemented using Particle Swarm Optimization (PSO) and SVM to synchronously determine the optimum kernel parameters and significant features as the optimal solution. The proposed method is evaluated on four medical datasets with different sizes in terms of accuracy, sensitivity, specificity, and Area under the Curve (AUC). Experimental results showed that the accuracy of the proposed method successfully increases significantly in each dataset by 96.15%, 95.41%, 96.62% and 96.50% with an optimal solution than conventional SVM. Via 10-fold cross-validation, the proposed method also signifies better classification performance compared to other existing methods. Therefore, the proposed method applies to handle high dimensional medical datasets for accurate disease prediction.