COMPARISON SIMULATION OF CARDIOVASCULAR DISEASE PREDICTION WITH LOGISTIC REGRESSION ANDDECISION TREE METHODS USING MACHINE LEARNING
Cardiovascular disease is one of the biggest contributors to global death which can strike unexpectedly, without any medical symptoms. Therefore, it is very important for the health sector to be able to detect cardiovascular disease earlier, especially in individuals who already have risk factors or...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/76106 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Cardiovascular disease is one of the biggest contributors to global death which can strike unexpectedly, without any medical symptoms. Therefore, it is very important for the health sector to be able to detect cardiovascular disease earlier, especially in individuals who already have risk factors or have certain indications. Given the limitations of human capabilities, technology has certainly developed into the health sector, especially machine learning which is now familiar and is often used to build cardiovascular disease prediction models based on patient medical record data. Therefore, a study will be conducted to compare the application of logistic regression models and decision trees, along with Tomek Links, SMOTETomek, and SMOTE-NC data resampling methods in predicting cardiovascular disease. The use of logistic regression and decision trees is intended to compare performance between the simplest and most popular machine learning methods, of course taking into account their good ability to deal with small datasets, while minimizing the risk of overfitting the model. Meanwhile, the SMOTE-NC, Tomek Links, and SMOTETomek resampling methods were used to compare the effect of oversampling, undersampling, and a combination of the two techniques on the model training process and the prediction results of cardiovascular disease. In this Final Project, a public dataset sourced from Kaggle is used which contains a total of 303 patient medical records from hospitals in the Cleveland and VA Long Beach areas (United States), Hungary, and Switzerland. There will be a balancing of data class classifications and prediction models will be built, successively with the three resampling methods and the two machine learning methods above, to then be evaluated using a confusion matrix and indicators of accuracy, precision, recall, and F1-score. Based on the results of this study, it was found that for the dataset used, the SMOTE-NC data resampling method and the decision tree model were the best in predicting cardiovascular disease. |
---|