Cold deck missing value imputation with a trust-based selection method of multiple web donors

Missing value is a common problem in any dataset and its occurrence decreases data completeness as data values are missing. Moreover, the problem reduces data quality and negatively impacted the result of data analysis. Existing cold deck imputation coped with this problem by selecting a replacem...

Full description

Saved in:
Bibliographic Details
Main Author: Mohd Jaya, Mohd Izham
Format: Thesis
Language:English
Published: 2018
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/83236/1/FSKTM%202018%2079%20-ir.pdf
http://psasir.upm.edu.my/id/eprint/83236/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
Language: English
Description
Summary:Missing value is a common problem in any dataset and its occurrence decreases data completeness as data values are missing. Moreover, the problem reduces data quality and negatively impacted the result of data analysis. Existing cold deck imputation coped with this problem by selecting a replacement value from a pool of donors identified in other data sources during the imputation process. In comparison to other imputation methods, existing cold deck imputation has less risk on model misspecification and preserves data distribution in the dataset. Nevertheless, the limitation of the existing cold deck imputation is the chances in finding trusted plausible donor is narrow due to a usage of single data source in each imputation process. The availability of various web data sources today alleviates this limitation. However, as values from multiple web data sources are commonly conflicted to each other, adopting existing cold deck imputation with multiple web donors is not a practical solution as trust score on each of the conflicted values is not measured. Thus, it is difficult to select the most plausible value during imputation process. This research concentrates on improving data completeness by imputing missing values using a trust based cold deck imputation. Trust Based Cold Deck Missing Values Imputation with Multiple Web Donor is presented in this research. The proposed method takes advantage of multiple web donors from web data sources in order to provide higher chances in finding the most plausible values to impute missing values. The plausible values are selected based on the trust score computation’s novelty which is measured by accuracy score and reliability score of the web donor. The performance of the proposed method is evaluated by running a prediction model on the imputed dataset. A number of experiments are carried out to quantify the accuracy of the prediction model, Root Mean Squared Error (RMSE), and the F-Measure. The results demonstrate that the proposed method improves the performance of existing cold deck imputation. Additionally, the results are then compared with other imputation methods which are K-Nearest Neighbor (KNN), Mean Imputation (AVG), Case Deletion (IGN), Predictive Mean Matching (PMM) and MissForest. The results showed that the RMSE, prediction accuracy and FMeasure is improved when the prediction model is trained with datasets imputed using the proposed method. This research contributed to the improvement of data quality especially to the information system (IS) and database field where good data quality benefited the data analysis performance.