SHAPLEY ADDITIVE EXPLANATION (SHAP) AS FEATURE SELECTION FOR ACUTE ARTERY DISEASE CLASSIFICATION MODEL DEVELOPMENT

Coronary heart disease is one of the leading causes of global mortality, making it crucial to develop accurate classification models for predicting this condition. However, datasets for coronary heart disease are often small and low-dimensional, which can increase the risk of overfitting if all f...

Full description

Saved in:
Bibliographic Details
Main Author: Afif Rizky A, M
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/86189
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:86189
spelling id-itb.:861892024-09-16T14:15:52ZSHAPLEY ADDITIVE EXPLANATION (SHAP) AS FEATURE SELECTION FOR ACUTE ARTERY DISEASE CLASSIFICATION MODEL DEVELOPMENT Afif Rizky A, M Indonesia Theses Feature selection, Classification, Acute Artery Disease, Shapley Additive Explanation, SHAP INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/86189 Coronary heart disease is one of the leading causes of global mortality, making it crucial to develop accurate classification models for predicting this condition. However, datasets for coronary heart disease are often small and low-dimensional, which can increase the risk of overfitting if all features are used in the classification model. Therefore, an appropriate feature selection method is necessary to choose the most relevant features. Some studies suggest that Shapley Additive Explanations (SHAP) holds potential as a solution for feature selection. This study aims to demonstrate that SHAP can be used as a feature selection solution in classification models for coronary heart disease data with small and low-dimensional characteristics.. The experiment was conducted using a coronary heart disease dataset characterized by its small size and low dimensionality. Two feature selection methods were compared: Principal Component Analysis (PCA) and expert validation. Classification models were built using the random forest algorithm, and model performance was evaluated using ROC-AUC and AU-PRC metrics to measure effectiveness in predicting coronary heart disease. The dataset was split into training and testing sets, and each model was tested in several experimental scenarios to assess the consistency of SHAP as a feature selection method. The experimental results show an improvement in the performance of the coronary heart disease classification model using SHAP for feature selection. The classification model experienced an increase in ROC-AUC from 0.91 to 0.94 and AU-PRC from 0.81 to 0.97 after applying feature selection, compared to models using PCA and features selected through expert validation. These findings demonstrate that SHAP enhances the accuracy and efficiency of coronary heart disease classification models using random forest, making it a highly useful method for feature selection in small-dimensional datasets, especially in the context of coronary heart disease cases.. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Coronary heart disease is one of the leading causes of global mortality, making it crucial to develop accurate classification models for predicting this condition. However, datasets for coronary heart disease are often small and low-dimensional, which can increase the risk of overfitting if all features are used in the classification model. Therefore, an appropriate feature selection method is necessary to choose the most relevant features. Some studies suggest that Shapley Additive Explanations (SHAP) holds potential as a solution for feature selection. This study aims to demonstrate that SHAP can be used as a feature selection solution in classification models for coronary heart disease data with small and low-dimensional characteristics.. The experiment was conducted using a coronary heart disease dataset characterized by its small size and low dimensionality. Two feature selection methods were compared: Principal Component Analysis (PCA) and expert validation. Classification models were built using the random forest algorithm, and model performance was evaluated using ROC-AUC and AU-PRC metrics to measure effectiveness in predicting coronary heart disease. The dataset was split into training and testing sets, and each model was tested in several experimental scenarios to assess the consistency of SHAP as a feature selection method. The experimental results show an improvement in the performance of the coronary heart disease classification model using SHAP for feature selection. The classification model experienced an increase in ROC-AUC from 0.91 to 0.94 and AU-PRC from 0.81 to 0.97 after applying feature selection, compared to models using PCA and features selected through expert validation. These findings demonstrate that SHAP enhances the accuracy and efficiency of coronary heart disease classification models using random forest, making it a highly useful method for feature selection in small-dimensional datasets, especially in the context of coronary heart disease cases..
format Theses
author Afif Rizky A, M
spellingShingle Afif Rizky A, M
SHAPLEY ADDITIVE EXPLANATION (SHAP) AS FEATURE SELECTION FOR ACUTE ARTERY DISEASE CLASSIFICATION MODEL DEVELOPMENT
author_facet Afif Rizky A, M
author_sort Afif Rizky A, M
title SHAPLEY ADDITIVE EXPLANATION (SHAP) AS FEATURE SELECTION FOR ACUTE ARTERY DISEASE CLASSIFICATION MODEL DEVELOPMENT
title_short SHAPLEY ADDITIVE EXPLANATION (SHAP) AS FEATURE SELECTION FOR ACUTE ARTERY DISEASE CLASSIFICATION MODEL DEVELOPMENT
title_full SHAPLEY ADDITIVE EXPLANATION (SHAP) AS FEATURE SELECTION FOR ACUTE ARTERY DISEASE CLASSIFICATION MODEL DEVELOPMENT
title_fullStr SHAPLEY ADDITIVE EXPLANATION (SHAP) AS FEATURE SELECTION FOR ACUTE ARTERY DISEASE CLASSIFICATION MODEL DEVELOPMENT
title_full_unstemmed SHAPLEY ADDITIVE EXPLANATION (SHAP) AS FEATURE SELECTION FOR ACUTE ARTERY DISEASE CLASSIFICATION MODEL DEVELOPMENT
title_sort shapley additive explanation (shap) as feature selection for acute artery disease classification model development
url https://digilib.itb.ac.id/gdl/view/86189
_version_ 1822283351623467008