SHAPLEY ADDITIVE EXPLANATION (SHAP) AS FEATURE SELECTION FOR ACUTE ARTERY DISEASE CLASSIFICATION MODEL DEVELOPMENT

Coronary heart disease is one of the leading causes of global mortality, making it crucial to develop accurate classification models for predicting this condition. However, datasets for coronary heart disease are often small and low-dimensional, which can increase the risk of overfitting if all f...

Full description

Saved in:
Bibliographic Details
Main Author: Afif Rizky A, M
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/86189
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Coronary heart disease is one of the leading causes of global mortality, making it crucial to develop accurate classification models for predicting this condition. However, datasets for coronary heart disease are often small and low-dimensional, which can increase the risk of overfitting if all features are used in the classification model. Therefore, an appropriate feature selection method is necessary to choose the most relevant features. Some studies suggest that Shapley Additive Explanations (SHAP) holds potential as a solution for feature selection. This study aims to demonstrate that SHAP can be used as a feature selection solution in classification models for coronary heart disease data with small and low-dimensional characteristics.. The experiment was conducted using a coronary heart disease dataset characterized by its small size and low dimensionality. Two feature selection methods were compared: Principal Component Analysis (PCA) and expert validation. Classification models were built using the random forest algorithm, and model performance was evaluated using ROC-AUC and AU-PRC metrics to measure effectiveness in predicting coronary heart disease. The dataset was split into training and testing sets, and each model was tested in several experimental scenarios to assess the consistency of SHAP as a feature selection method. The experimental results show an improvement in the performance of the coronary heart disease classification model using SHAP for feature selection. The classification model experienced an increase in ROC-AUC from 0.91 to 0.94 and AU-PRC from 0.81 to 0.97 after applying feature selection, compared to models using PCA and features selected through expert validation. These findings demonstrate that SHAP enhances the accuracy and efficiency of coronary heart disease classification models using random forest, making it a highly useful method for feature selection in small-dimensional datasets, especially in the context of coronary heart disease cases..