Time-to-event analysis on tracheal intubation prediction from time-series electrical health records

Nowadays, ‘Big Data’ is a popular topic, which refers to the collection of data that is huge in size and growing exponentially with time. ‘Big Data’ is significant for three unique features: volume, velocity, and variety. Volume refers to the amount of data, velocity refers to the speed of data proc...

Full description

Saved in:
Bibliographic Details
Main Author: Deng, ZiChao
Other Authors: Shum Ping
Format: Final Year Project
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/75514
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Nowadays, ‘Big Data’ is a popular topic, which refers to the collection of data that is huge in size and growing exponentially with time. ‘Big Data’ is significant for three unique features: volume, velocity, and variety. Volume refers to the amount of data, velocity refers to the speed of data processing and variety refers to the number of data types. Today, data volumes keep exploding and how can we discover useful knowledge from this ever-growing ocean of data? Data mining is the process of extracting hidden data patterns and converting them to useful knowledge from huge databases. It has been applied in various fields such as e-commerce, finance and business, education, telecommunications and last but not least, clinical research. This final year project was the application of data mining in the clinical research. The ‘tracheal intubation’ is to place a flexible plastic tube into the ICU patients’ trachea to facilitate the ventilation of the lungs, which is an important clinical treatment for patients who are critically injured, ill or anesthetized. The objective of this project was to predict whether the physicians need to perform ‘tracheal intubation’ for ICU patients within the next 3 hours, based on the patients’ time-series clinical health records by using both conventional and novel data mining or deep learning algorithms. There were four foundational phases of this project: data preparation, data preprocessing, data modeling and model evaluations. In the data preparation phase, a large, freely accessible relational database comprising clinic-related health records of over forty thousand patients named MIMIC-III Critical Care Database was used and configured into PostgreSQL Database Management System on an Amazon Web Services Elastic Compute Cloud (AWS EC2) Ubuntu/Linux server. In the data preprocessing phase, there were five sub-phases named Feature Selection, Data Integration, Data Cleaning, Data Transformation and Dimensionality Reduction. The objective of data preprocessing was to select distinguish ICU patients and their time-series clinical health records within 96 hours ICU admission from the database. In the data modeling phase, 7 machine learning models were applied to address this classification problem namely logistic regression, decision tree, support vector machine (SVM), deep neural networks (DNN), recurrent neural networks (RNN), long short-term memory (LSTM) networks, bidirectional LSTM networks. The first three are conventional machine learning models and the last four are modern machine learning models related to deep learning. In the model evaluation phase, evaluation and comparison of these 7 machine learning models were performed to determine which model achieved the best prediction accuracy.