COMPARISON SIMULATION OF CARDIOVASCULAR DISEASE PREDICTION WITH LOGISTIC REGRESSION ANDDECISION TREE METHODS USING MACHINE LEARNING

Cardiovascular disease is one of the biggest contributors to global death which can strike unexpectedly, without any medical symptoms. Therefore, it is very important for the health sector to be able to detect cardiovascular disease earlier, especially in individuals who already have risk factors or...

Full description

Saved in:
Bibliographic Details
Main Author: Noble Lawrence, Hansel
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/76106
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:76106
spelling id-itb.:761062023-08-10T15:19:29ZCOMPARISON SIMULATION OF CARDIOVASCULAR DISEASE PREDICTION WITH LOGISTIC REGRESSION ANDDECISION TREE METHODS USING MACHINE LEARNING Noble Lawrence, Hansel Indonesia Final Project Cardiovascular Disease, Machine Learning, Logistic Regression, Decision Trees, Data Resampling INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76106 Cardiovascular disease is one of the biggest contributors to global death which can strike unexpectedly, without any medical symptoms. Therefore, it is very important for the health sector to be able to detect cardiovascular disease earlier, especially in individuals who already have risk factors or have certain indications. Given the limitations of human capabilities, technology has certainly developed into the health sector, especially machine learning which is now familiar and is often used to build cardiovascular disease prediction models based on patient medical record data. Therefore, a study will be conducted to compare the application of logistic regression models and decision trees, along with Tomek Links, SMOTETomek, and SMOTE-NC data resampling methods in predicting cardiovascular disease. The use of logistic regression and decision trees is intended to compare performance between the simplest and most popular machine learning methods, of course taking into account their good ability to deal with small datasets, while minimizing the risk of overfitting the model. Meanwhile, the SMOTE-NC, Tomek Links, and SMOTETomek resampling methods were used to compare the effect of oversampling, undersampling, and a combination of the two techniques on the model training process and the prediction results of cardiovascular disease. In this Final Project, a public dataset sourced from Kaggle is used which contains a total of 303 patient medical records from hospitals in the Cleveland and VA Long Beach areas (United States), Hungary, and Switzerland. There will be a balancing of data class classifications and prediction models will be built, successively with the three resampling methods and the two machine learning methods above, to then be evaluated using a confusion matrix and indicators of accuracy, precision, recall, and F1-score. Based on the results of this study, it was found that for the dataset used, the SMOTE-NC data resampling method and the decision tree model were the best in predicting cardiovascular disease. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Cardiovascular disease is one of the biggest contributors to global death which can strike unexpectedly, without any medical symptoms. Therefore, it is very important for the health sector to be able to detect cardiovascular disease earlier, especially in individuals who already have risk factors or have certain indications. Given the limitations of human capabilities, technology has certainly developed into the health sector, especially machine learning which is now familiar and is often used to build cardiovascular disease prediction models based on patient medical record data. Therefore, a study will be conducted to compare the application of logistic regression models and decision trees, along with Tomek Links, SMOTETomek, and SMOTE-NC data resampling methods in predicting cardiovascular disease. The use of logistic regression and decision trees is intended to compare performance between the simplest and most popular machine learning methods, of course taking into account their good ability to deal with small datasets, while minimizing the risk of overfitting the model. Meanwhile, the SMOTE-NC, Tomek Links, and SMOTETomek resampling methods were used to compare the effect of oversampling, undersampling, and a combination of the two techniques on the model training process and the prediction results of cardiovascular disease. In this Final Project, a public dataset sourced from Kaggle is used which contains a total of 303 patient medical records from hospitals in the Cleveland and VA Long Beach areas (United States), Hungary, and Switzerland. There will be a balancing of data class classifications and prediction models will be built, successively with the three resampling methods and the two machine learning methods above, to then be evaluated using a confusion matrix and indicators of accuracy, precision, recall, and F1-score. Based on the results of this study, it was found that for the dataset used, the SMOTE-NC data resampling method and the decision tree model were the best in predicting cardiovascular disease.
format Final Project
author Noble Lawrence, Hansel
spellingShingle Noble Lawrence, Hansel
COMPARISON SIMULATION OF CARDIOVASCULAR DISEASE PREDICTION WITH LOGISTIC REGRESSION ANDDECISION TREE METHODS USING MACHINE LEARNING
author_facet Noble Lawrence, Hansel
author_sort Noble Lawrence, Hansel
title COMPARISON SIMULATION OF CARDIOVASCULAR DISEASE PREDICTION WITH LOGISTIC REGRESSION ANDDECISION TREE METHODS USING MACHINE LEARNING
title_short COMPARISON SIMULATION OF CARDIOVASCULAR DISEASE PREDICTION WITH LOGISTIC REGRESSION ANDDECISION TREE METHODS USING MACHINE LEARNING
title_full COMPARISON SIMULATION OF CARDIOVASCULAR DISEASE PREDICTION WITH LOGISTIC REGRESSION ANDDECISION TREE METHODS USING MACHINE LEARNING
title_fullStr COMPARISON SIMULATION OF CARDIOVASCULAR DISEASE PREDICTION WITH LOGISTIC REGRESSION ANDDECISION TREE METHODS USING MACHINE LEARNING
title_full_unstemmed COMPARISON SIMULATION OF CARDIOVASCULAR DISEASE PREDICTION WITH LOGISTIC REGRESSION ANDDECISION TREE METHODS USING MACHINE LEARNING
title_sort comparison simulation of cardiovascular disease prediction with logistic regression anddecision tree methods using machine learning
url https://digilib.itb.ac.id/gdl/view/76106
_version_ 1822994650989854720