Comparative analysis of imputation methods for enhancing predictive accuracy in data models

The presence of missing values within datasets can introduce a detrimental bias, significantly impeding the predictive algorithm's ability to discern patterns and accurately execute prediction. This paper aims to elucidate the intricacies of data imputation methods, providing a more profound un...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nurul Aqilah, Zamri, Mohd Izham, Mohd Jaya, Irawati, Indrarini Dyah, Rassem, Taha H., Rasyidah, ., Shahreen, Kasim
Format:	Article
Language:	English
Published:	Politeknik Negeri Padang 2024
Subjects:	QA75 Electronic computers. Computer science T Technology (General)
Online Access:	http://umpir.ump.edu.my/id/eprint/42748/1/Comparative%20analysis%20of%20imputation%20methods%20for%20enhancing%20predictive%20accuracy%20in%20data%20models.pdf http://umpir.ump.edu.my/id/eprint/42748/ https://www.joiv.org/index.php/joiv/article/view/1666
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Malaysia Pahang Al-Sultan Abdullah
Language:	English

id	my.ump.umpir.42748
record_format	eprints
spelling	my.ump.umpir.427482024-10-04T04:04:44Z http://umpir.ump.edu.my/id/eprint/42748/ Comparative analysis of imputation methods for enhancing predictive accuracy in data models Nurul Aqilah, Zamri Mohd Izham, Mohd Jaya Irawati, Indrarini Dyah Rassem, Taha H. Rasyidah, . Shahreen, Kasim QA75 Electronic computers. Computer science T Technology (General) The presence of missing values within datasets can introduce a detrimental bias, significantly impeding the predictive algorithm's ability to discern patterns and accurately execute prediction. This paper aims to elucidate the intricacies of data imputation methods, providing a more profound understanding of prevalent imputation methods, including list-wise deletion (IGN), mean imputation (AVG), K-Nearest Neighbors (KNN), MissForest (MF), and Predictive Mean Matching (PMM). The dataset employed in this study consists of financial data about S&P 500 companies in the Compustat North America database. The training and validation dataset encompasses 1973 instances, consisting of data during the fourth quarter of 2009, the first quarter of 2010, and the third quarter of 2014. Within this set, 457 missing values were identified and imputed. The test dataset comprises 197 randomly selected instances from the fourth quarter of 2014, equivalent to ten percent of the total instances in the training dataset. The evaluation findings prominently position the dataset derived from MF imputation as the leading performer among all the imputed datasets. The insights derived from this study are intended to assist practitioners in making informed choices when selecting the most suitable data imputation method, particularly in the context of predictive modeling tasks. Politeknik Negeri Padang 2024 Article PeerReviewed pdf en cc_by_nc_sa_4 http://umpir.ump.edu.my/id/eprint/42748/1/Comparative%20analysis%20of%20imputation%20methods%20for%20enhancing%20predictive%20accuracy%20in%20data%20models.pdf Nurul Aqilah, Zamri and Mohd Izham, Mohd Jaya and Irawati, Indrarini Dyah and Rassem, Taha H. and Rasyidah, . and Shahreen, Kasim (2024) Comparative analysis of imputation methods for enhancing predictive accuracy in data models. International Journal on Informatics Visualization, 8 (3). pp. 1271-1276. ISSN 2549-9904. (Published) https://www.joiv.org/index.php/joiv/article/view/1666
institution	Universiti Malaysia Pahang Al-Sultan Abdullah
building	UMPSA Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaysia Pahang Al-Sultan Abdullah
content_source	UMPSA Institutional Repository
url_provider	http://umpir.ump.edu.my/
language	English
topic	QA75 Electronic computers. Computer science T Technology (General)
spellingShingle	QA75 Electronic computers. Computer science T Technology (General) Nurul Aqilah, Zamri Mohd Izham, Mohd Jaya Irawati, Indrarini Dyah Rassem, Taha H. Rasyidah, . Shahreen, Kasim Comparative analysis of imputation methods for enhancing predictive accuracy in data models
description	The presence of missing values within datasets can introduce a detrimental bias, significantly impeding the predictive algorithm's ability to discern patterns and accurately execute prediction. This paper aims to elucidate the intricacies of data imputation methods, providing a more profound understanding of prevalent imputation methods, including list-wise deletion (IGN), mean imputation (AVG), K-Nearest Neighbors (KNN), MissForest (MF), and Predictive Mean Matching (PMM). The dataset employed in this study consists of financial data about S&P 500 companies in the Compustat North America database. The training and validation dataset encompasses 1973 instances, consisting of data during the fourth quarter of 2009, the first quarter of 2010, and the third quarter of 2014. Within this set, 457 missing values were identified and imputed. The test dataset comprises 197 randomly selected instances from the fourth quarter of 2014, equivalent to ten percent of the total instances in the training dataset. The evaluation findings prominently position the dataset derived from MF imputation as the leading performer among all the imputed datasets. The insights derived from this study are intended to assist practitioners in making informed choices when selecting the most suitable data imputation method, particularly in the context of predictive modeling tasks.
format	Article
author	Nurul Aqilah, Zamri Mohd Izham, Mohd Jaya Irawati, Indrarini Dyah Rassem, Taha H. Rasyidah, . Shahreen, Kasim
author_facet	Nurul Aqilah, Zamri Mohd Izham, Mohd Jaya Irawati, Indrarini Dyah Rassem, Taha H. Rasyidah, . Shahreen, Kasim
author_sort	Nurul Aqilah, Zamri
title	Comparative analysis of imputation methods for enhancing predictive accuracy in data models
title_short	Comparative analysis of imputation methods for enhancing predictive accuracy in data models
title_full	Comparative analysis of imputation methods for enhancing predictive accuracy in data models
title_fullStr	Comparative analysis of imputation methods for enhancing predictive accuracy in data models
title_full_unstemmed	Comparative analysis of imputation methods for enhancing predictive accuracy in data models
title_sort	comparative analysis of imputation methods for enhancing predictive accuracy in data models
publisher	Politeknik Negeri Padang
publishDate	2024
url	http://umpir.ump.edu.my/id/eprint/42748/1/Comparative%20analysis%20of%20imputation%20methods%20for%20enhancing%20predictive%20accuracy%20in%20data%20models.pdf http://umpir.ump.edu.my/id/eprint/42748/ https://www.joiv.org/index.php/joiv/article/view/1666
_version_	1822924703619088384

Comparative analysis of imputation methods for enhancing predictive accuracy in data models

Similar Items