Analysis of irregularly sampled time series health care data sets

The real-world healthcare system generates abundant time-series data. In most cases, these data have a high prevalence of missing values and are often irregularly sampled across both time and patient. Moreover, due to the complex level of a different dataset, the preprocessing is more significant an...

Full description

Saved in:

Bibliographic Details
Main Author:	Wang, Anni
Other Authors:	Ponnuthurai Nagaratnam Suganthan
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/154677
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-154677
record_format	dspace
spelling	sg-ntu-dr.10356-1546772023-07-04T16:40:31Z Analysis of irregularly sampled time series health care data sets Wang, Anni Ponnuthurai Nagaratnam Suganthan School of Electrical and Electronic Engineering A*STAR Institute for Infocomm Research (I2R) Ramasamy Savitha EPNSugan@ntu.edu.sg Engineering::Electrical and electronic engineering The real-world healthcare system generates abundant time-series data. In most cases, these data have a high prevalence of missing values and are often irregularly sampled across both time and patient. Moreover, due to the complex level of a different dataset, the preprocessing is more significant and challenging. This dissertation focuses on imputation and prediction tasks to address the challenges of irregularly sampled time series data sets. First, we trained the Recurrent Imputation for Time Series (RITS) model and Bayesian Long Short Term Memory (BLSTM) model on a publicly available PhysioNet dataset for prediction task only. Next, we, train a Bayesian LSTM for imputation of missing values (Considering the irregular sampling as missing values too) and prediction of outcomes, on a proprietary heart failure risk prediction data set. The proposed model represents the distribution of irregularly sampled time series data, imputes both categorical and continuous missing data in the time series, and makes prediction of the outcome of interest. Furthermore, the Bayesian model allows for a reliable estimate of the outcome of interest. While the missing continuous variables are imputed through the MAE error minimization, the categorical variables are imputed using softmax and/or argmax operations. The outcome prediction task in the presence of imbalanced data set is addressed through the weighted loss function. Performance results indicate that the proposed approach is effective in imputing both categorical and continuous variables, with the superior prediction of outcome of interest. Master of Science (Computer Control and Automation) 2022-01-03T08:26:11Z 2022-01-03T08:26:11Z 2021 Thesis-Master by Coursework Wang, A. (2021). Analysis of irregularly sampled time series health care data sets. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/154677 https://hdl.handle.net/10356/154677 en D-255-20211-02961 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Electrical and electronic engineering Wang, Anni Analysis of irregularly sampled time series health care data sets
description	The real-world healthcare system generates abundant time-series data. In most cases, these data have a high prevalence of missing values and are often irregularly sampled across both time and patient. Moreover, due to the complex level of a different dataset, the preprocessing is more significant and challenging. This dissertation focuses on imputation and prediction tasks to address the challenges of irregularly sampled time series data sets. First, we trained the Recurrent Imputation for Time Series (RITS) model and Bayesian Long Short Term Memory (BLSTM) model on a publicly available PhysioNet dataset for prediction task only. Next, we, train a Bayesian LSTM for imputation of missing values (Considering the irregular sampling as missing values too) and prediction of outcomes, on a proprietary heart failure risk prediction data set. The proposed model represents the distribution of irregularly sampled time series data, imputes both categorical and continuous missing data in the time series, and makes prediction of the outcome of interest. Furthermore, the Bayesian model allows for a reliable estimate of the outcome of interest. While the missing continuous variables are imputed through the MAE error minimization, the categorical variables are imputed using softmax and/or argmax operations. The outcome prediction task in the presence of imbalanced data set is addressed through the weighted loss function. Performance results indicate that the proposed approach is effective in imputing both categorical and continuous variables, with the superior prediction of outcome of interest.
author2	Ponnuthurai Nagaratnam Suganthan
author_facet	Ponnuthurai Nagaratnam Suganthan Wang, Anni
format	Thesis-Master by Coursework
author	Wang, Anni
author_sort	Wang, Anni
title	Analysis of irregularly sampled time series health care data sets
title_short	Analysis of irregularly sampled time series health care data sets
title_full	Analysis of irregularly sampled time series health care data sets
title_fullStr	Analysis of irregularly sampled time series health care data sets
title_full_unstemmed	Analysis of irregularly sampled time series health care data sets
title_sort	analysis of irregularly sampled time series health care data sets
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/154677
_version_	1772827128265441280

Analysis of irregularly sampled time series health care data sets

Similar Items