Predicting occurrence of diabetes related adverse events with machine learning techniques

Hypoglycaemia is a potentially life-threatening complication of diabetes treatment. It is defined as having a blood sugar level of below 4mmol/L in diabetic individuals. Inpatient hypoglycaemia among diabetes patients is frequently due to the mismatch of diabetes treatment and other factors. The rec...

Full description

Saved in:
Bibliographic Details
Main Author: Kwah, Yuki Yan Yu
Other Authors: -
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/139421
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Hypoglycaemia is a potentially life-threatening complication of diabetes treatment. It is defined as having a blood sugar level of below 4mmol/L in diabetic individuals. Inpatient hypoglycaemia among diabetes patients is frequently due to the mismatch of diabetes treatment and other factors. The recurrence of hypoglycaemia increases risks for patients by which they further proceed to hypoglycaemia unawareness. With this, patients will be unable to show symptoms such as shakiness and irregular heartbeats, which are signs the body will present to warn one of low blood sugar. In Singapore, the overall prevalence of diabetes in Singapore has risen from 9% in 1998 to 11.3% in 2010. By constructing predictive models that can predict the recurrence of hypoglycaemia at high accuracy, an intervention can be put in place to prevent patients from having a second episode. Length of stays of diabetes patients can hence be shortened, allowing hospitals to cope better with the rising demand for acute medical beds, which is a pressing issue for acute hospitals in Singapore. In this paper, several machine learning methodologies were explored to predict inpatient recurrence of hypoglycaemia in diabetes patients. Decision trees, K-nearest neighbour (K-NN) algorithm and random forests were employed, with comparison to the more traditional logistic regression model, as well as a scoring system which was provided as a motivation of topic. The data set consisting of 205 patients and 25 predictor variables was introduced and analysed. To summarise the findings, all methods successfully improved the prediction accuracy of the scoring system, however, the logistic regression model (accuracy of 0.742) obtained a higher accuracy than all other machine learning models, with the random forest being the most accurate of all three machine learning methods (accuracy of 0.661). This study hence highlighted the potential difficulties of applying machine learning to small, moderately dimensional data sets, and discussed further improvements that can be made during further studies of the topic. These include increasing the sample size of the collected data set, further collection of other clinical parameters which can allow more statistical methods to be used for analysis, and last but not least, exploration of more advanced machine learning techniques, with the luxury of a larger data set and better predictor variables provided in the future.