Missing data characteristics and the choice of imputation technique: an empirical study
One important characteristic of good data is completeness. Missing data is a major problem in the classification of medical datasets. It leads to incorrect classification of patients, which is dangerous to health management of patients. Many imputation techniques have been employed to solve this pro...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference or Workshop Item |
Published: |
2020
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/93785/ http://dx.doi.org/10.1007/978-3-030-33582-3_9 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Malaysia |
id |
my.utm.93785 |
---|---|
record_format |
eprints |
spelling |
my.utm.937852021-12-31T08:51:00Z http://eprints.utm.my/id/eprint/93785/ Missing data characteristics and the choice of imputation technique: an empirical study Alade, Oyekale Abel Sallehuddin, Roselina Mohamed Radzi, Nor Haizan Selamat, Ali QA75 Electronic computers. Computer science One important characteristic of good data is completeness. Missing data is a major problem in the classification of medical datasets. It leads to incorrect classification of patients, which is dangerous to health management of patients. Many imputation techniques have been employed to solve this problem, but these techniques are without recourse to the characteristics that cause the missingness. In this paper, we investigated the causes of missing data in a medical dataset and proposed multiple imputation technique to solving the problem of missing data. A 5-fold-iteration multiple imputation was employed. The whole missing values in the dataset was regenerated 100%. The imputed datasets were validated using extreme learning machine (ELM) classifier. The results show improvement on the accuracy of the imputed datasets. The work can, however, be extended to compare the accuracy of the imputed datasets with different classifiers. 2020-01 Conference or Workshop Item PeerReviewed Alade, Oyekale Abel and Sallehuddin, Roselina and Mohamed Radzi, Nor Haizan and Selamat, Ali (2020) Missing data characteristics and the choice of imputation technique: an empirical study. In: 4th International Conference of Reliable Information and Communication Technology, IRICT 2019, 22 September 2019 - 23 September 2019, Johor Bahru, Johor, Malaysia. http://dx.doi.org/10.1007/978-3-030-33582-3_9 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Alade, Oyekale Abel Sallehuddin, Roselina Mohamed Radzi, Nor Haizan Selamat, Ali Missing data characteristics and the choice of imputation technique: an empirical study |
description |
One important characteristic of good data is completeness. Missing data is a major problem in the classification of medical datasets. It leads to incorrect classification of patients, which is dangerous to health management of patients. Many imputation techniques have been employed to solve this problem, but these techniques are without recourse to the characteristics that cause the missingness. In this paper, we investigated the causes of missing data in a medical dataset and proposed multiple imputation technique to solving the problem of missing data. A 5-fold-iteration multiple imputation was employed. The whole missing values in the dataset was regenerated 100%. The imputed datasets were validated using extreme learning machine (ELM) classifier. The results show improvement on the accuracy of the imputed datasets. The work can, however, be extended to compare the accuracy of the imputed datasets with different classifiers. |
format |
Conference or Workshop Item |
author |
Alade, Oyekale Abel Sallehuddin, Roselina Mohamed Radzi, Nor Haizan Selamat, Ali |
author_facet |
Alade, Oyekale Abel Sallehuddin, Roselina Mohamed Radzi, Nor Haizan Selamat, Ali |
author_sort |
Alade, Oyekale Abel |
title |
Missing data characteristics and the choice of imputation technique: an empirical study |
title_short |
Missing data characteristics and the choice of imputation technique: an empirical study |
title_full |
Missing data characteristics and the choice of imputation technique: an empirical study |
title_fullStr |
Missing data characteristics and the choice of imputation technique: an empirical study |
title_full_unstemmed |
Missing data characteristics and the choice of imputation technique: an empirical study |
title_sort |
missing data characteristics and the choice of imputation technique: an empirical study |
publishDate |
2020 |
url |
http://eprints.utm.my/id/eprint/93785/ http://dx.doi.org/10.1007/978-3-030-33582-3_9 |
_version_ |
1720980124647555072 |