Missing value imputation for diabetes prediction

Machine learning (ML) models have been widely used to improve the accuracy and efficiency of various types of disease diagnostic tasks. However, it is still challenging to apply ML models to perform diabetes-related prediction tasks mainly because patients' health records are sparse and have a...

Full description

Saved in:
Bibliographic Details
Main Authors: Luo, Fei, Qian, Hangwei, Wang, Di, Guo, Xu, Sun, Yan, Lee, Eng Sing, Teong, Hui Hwang, Lai, Ray Tian Rui, Miao, Chunyan
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164147
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-164147
record_format dspace
spelling sg-ntu-dr.10356-1641472023-01-06T05:15:15Z Missing value imputation for diabetes prediction Luo, Fei Qian, Hangwei Wang, Di Guo, Xu Sun, Yan Lee, Eng Sing Teong, Hui Hwang Lai, Ray Tian Rui Miao, Chunyan School of Computer Science and Engineering 2022 International Joint Conference on Neural Networks (IJCNN) Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly (LILY) Engineering::Computer science and engineering Diabetes-Related Dataset Diabetes Prediction Machine learning (ML) models have been widely used to improve the accuracy and efficiency of various types of disease diagnostic tasks. However, it is still challenging to apply ML models to perform diabetes-related prediction tasks mainly because patients' health records are sparse and have a vast amount of missing values. Missing values often break the diabetes prediction pipelines, posing challenges to existing approaches. Such problem deteriorates significantly when critical attribute values (e.g., blood test results on HbAlc, FPG and OGTT2hr) are missing. In this paper, we introduce a large-scale diabetes-related dataset named Chronic Disease Management System (CDMS) dataset, which collects the clinical records of more than 700,000 visits of over 65,000 patients across eight years. CDMS is anonymously collected and has a high percentage of missing values on several critical attributes for diabetes prediction. If not being dealt with carefully, the missing values will cause significant performance degradation of the applied ML models. In this paper, we also investigate the effectiveness of multiple data imputation methods through conducting extensive experiments using CDMS. Experimental results show that k-Nearest Neighbor Imputation (KNNI) performs better than other methods in this diabetes prediction task. Specifically, with KNNI applied, the diabetes prediction accuracy and precision are both over 0.8 using various ML predictive models. AI Singapore National Research Foundation (NRF) Submitted/Accepted version This research is supported, in part, by the National Research Foundation (NRF), Singapore under its AI Singapore Programme (AISG Award No: AISG-GC-2019-003). H. Qian thanks the support from the Wallenberg-NTU Presidential Postdoctoral Fellowship. 2023-01-06T05:07:17Z 2023-01-06T05:07:17Z 2022 Conference Paper Luo, F., Qian, H., Wang, D., Guo, X., Sun, Y., Lee, E. S., Teong, H. H., Lai, R. T. R. & Miao, C. (2022). Missing value imputation for diabetes prediction. 2022 International Joint Conference On Neural Networks (IJCNN). https://dx.doi.org/10.1109/IJCNN55064.2022.9892398 9781728186719 2161-4407 https://hdl.handle.net/10356/164147 10.1109/IJCNN55064.2022.9892398 2-s2.0-85140755664 en AISG-GC-2019-003 © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/IJCNN55064.2022.9892398. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Diabetes-Related Dataset
Diabetes Prediction
spellingShingle Engineering::Computer science and engineering
Diabetes-Related Dataset
Diabetes Prediction
Luo, Fei
Qian, Hangwei
Wang, Di
Guo, Xu
Sun, Yan
Lee, Eng Sing
Teong, Hui Hwang
Lai, Ray Tian Rui
Miao, Chunyan
Missing value imputation for diabetes prediction
description Machine learning (ML) models have been widely used to improve the accuracy and efficiency of various types of disease diagnostic tasks. However, it is still challenging to apply ML models to perform diabetes-related prediction tasks mainly because patients' health records are sparse and have a vast amount of missing values. Missing values often break the diabetes prediction pipelines, posing challenges to existing approaches. Such problem deteriorates significantly when critical attribute values (e.g., blood test results on HbAlc, FPG and OGTT2hr) are missing. In this paper, we introduce a large-scale diabetes-related dataset named Chronic Disease Management System (CDMS) dataset, which collects the clinical records of more than 700,000 visits of over 65,000 patients across eight years. CDMS is anonymously collected and has a high percentage of missing values on several critical attributes for diabetes prediction. If not being dealt with carefully, the missing values will cause significant performance degradation of the applied ML models. In this paper, we also investigate the effectiveness of multiple data imputation methods through conducting extensive experiments using CDMS. Experimental results show that k-Nearest Neighbor Imputation (KNNI) performs better than other methods in this diabetes prediction task. Specifically, with KNNI applied, the diabetes prediction accuracy and precision are both over 0.8 using various ML predictive models.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Luo, Fei
Qian, Hangwei
Wang, Di
Guo, Xu
Sun, Yan
Lee, Eng Sing
Teong, Hui Hwang
Lai, Ray Tian Rui
Miao, Chunyan
format Conference or Workshop Item
author Luo, Fei
Qian, Hangwei
Wang, Di
Guo, Xu
Sun, Yan
Lee, Eng Sing
Teong, Hui Hwang
Lai, Ray Tian Rui
Miao, Chunyan
author_sort Luo, Fei
title Missing value imputation for diabetes prediction
title_short Missing value imputation for diabetes prediction
title_full Missing value imputation for diabetes prediction
title_fullStr Missing value imputation for diabetes prediction
title_full_unstemmed Missing value imputation for diabetes prediction
title_sort missing value imputation for diabetes prediction
publishDate 2023
url https://hdl.handle.net/10356/164147
_version_ 1754611287804870656