ProJect: a powerful mixed-model missing value imputation method

Missing values (MVs) can adversely impact data analysis and machine-learning model development. We propose a novel mixed-model method for missing value imputation (MVI). This method, ProJect (short for Protein inJection), is a powerful and meaningful improvement over existing MVI methods such as Bay...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kong, Weijia, Wong, Bertrand Jern Han, Hui, Harvard Wai Hann, Lim, Kai-Peng, Wang, Yulan, Wong, Limsoon, Goh, Wilson Wen Bin
Other Authors:	School of Biological Sciences
Format:	Article
Language:	English
Published:	2023
Subjects:	Science::Biological sciences Bioinformatics Missing at Random
Online Access:	https://hdl.handle.net/10356/171093
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-171093
record_format	dspace
spelling	sg-ntu-dr.10356-1710932023-10-16T15:32:40Z ProJect: a powerful mixed-model missing value imputation method Kong, Weijia Wong, Bertrand Jern Han Hui, Harvard Wai Hann Lim, Kai-Peng Wang, Yulan Wong, Limsoon Goh, Wilson Wen Bin School of Biological Sciences Science::Biological sciences Bioinformatics Missing at Random Missing values (MVs) can adversely impact data analysis and machine-learning model development. We propose a novel mixed-model method for missing value imputation (MVI). This method, ProJect (short for Protein inJection), is a powerful and meaningful improvement over existing MVI methods such as Bayesian principal component analysis (PCA), probabilistic PCA, local least squares and quantile regression imputation of left-censored data. We rigorously tested ProJect on various high-throughput data types, including genomics and mass spectrometry (MS)-based proteomics. Specifically, we utilized renal cancer (RC) data acquired using DIA-SWATH, ovarian cancer (OC) data acquired using DIA-MS, bladder (BladderBatch) and glioblastoma (GBM) microarray gene expression dataset. Our results demonstrate that ProJect consistently performs better than other referenced MVI methods. It achieves the lowest normalized root mean square error (on average, scoring 45.92% less error in RC_C, 27.37% in RC_full, 29.22% in OC, 23.65% in BladderBatch and 20.20% in GBM relative to the closest competing method) and the Procrustes sum of squared error (Procrustes SS) (exhibits 79.71% less error in RC_C, 38.36% in RC full, 18.13% in OC, 74.74% in BladderBatch and 30.79% in GBM compared to the next best method). ProJect also leads with the highest correlation coefficient among all types of MV combinations (0.64% higher in RC_C, 0.24% in RC full, 0.55% in OC, 0.39% in BladderBatch and 0.27% in GBM versus the second-best performing method). ProJect's key strength is its ability to handle different types of MVs commonly found in real-world data. Unlike most MVI methods that are designed to handle only one type of MV, ProJect employs a decision-making algorithm that first determines if an MV is missing at random or missing not at random. It then employs targeted imputation strategies for each MV type, resulting in more accurate and reliable imputation outcomes. An R implementation of ProJect is available at https://github.com/miaomiao6606/ProJect. Ministry of Education (MOE) Submitted/Accepted version This work is supported in part by a Singapore Ministry of Education tier-2 grant (MOE2019-T2-1-042) and a Singapore Ministry of Education tier-1 grant (RG35/20). 2023-10-12T14:16:51Z 2023-10-12T14:16:51Z 2023 Journal Article Kong, W., Wong, B. J. H., Hui, H. W. H., Lim, K., Wang, Y., Wong, L. & Goh, W. W. B. (2023). ProJect: a powerful mixed-model missing value imputation method. Briefings in Bioinformatics, 24(4), bbab233-. https://dx.doi.org/10.1093/bib/bbad233 1467-5463 https://hdl.handle.net/10356/171093 10.1093/bib/bbad233 37419612 2-s2.0-85165521396 4 24 bbab233 en MOE2019-T2-1-042 RG35/20 Briefings in Bioinformatics © 2023 The Author(s). Published by Oxford University Press. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1093/bib/bbad233. application/pdf application/pdf application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Science::Biological sciences Bioinformatics Missing at Random
spellingShingle	Science::Biological sciences Bioinformatics Missing at Random Kong, Weijia Wong, Bertrand Jern Han Hui, Harvard Wai Hann Lim, Kai-Peng Wang, Yulan Wong, Limsoon Goh, Wilson Wen Bin ProJect: a powerful mixed-model missing value imputation method
description	Missing values (MVs) can adversely impact data analysis and machine-learning model development. We propose a novel mixed-model method for missing value imputation (MVI). This method, ProJect (short for Protein inJection), is a powerful and meaningful improvement over existing MVI methods such as Bayesian principal component analysis (PCA), probabilistic PCA, local least squares and quantile regression imputation of left-censored data. We rigorously tested ProJect on various high-throughput data types, including genomics and mass spectrometry (MS)-based proteomics. Specifically, we utilized renal cancer (RC) data acquired using DIA-SWATH, ovarian cancer (OC) data acquired using DIA-MS, bladder (BladderBatch) and glioblastoma (GBM) microarray gene expression dataset. Our results demonstrate that ProJect consistently performs better than other referenced MVI methods. It achieves the lowest normalized root mean square error (on average, scoring 45.92% less error in RC_C, 27.37% in RC_full, 29.22% in OC, 23.65% in BladderBatch and 20.20% in GBM relative to the closest competing method) and the Procrustes sum of squared error (Procrustes SS) (exhibits 79.71% less error in RC_C, 38.36% in RC full, 18.13% in OC, 74.74% in BladderBatch and 30.79% in GBM compared to the next best method). ProJect also leads with the highest correlation coefficient among all types of MV combinations (0.64% higher in RC_C, 0.24% in RC full, 0.55% in OC, 0.39% in BladderBatch and 0.27% in GBM versus the second-best performing method). ProJect's key strength is its ability to handle different types of MVs commonly found in real-world data. Unlike most MVI methods that are designed to handle only one type of MV, ProJect employs a decision-making algorithm that first determines if an MV is missing at random or missing not at random. It then employs targeted imputation strategies for each MV type, resulting in more accurate and reliable imputation outcomes. An R implementation of ProJect is available at https://github.com/miaomiao6606/ProJect.
author2	School of Biological Sciences
author_facet	School of Biological Sciences Kong, Weijia Wong, Bertrand Jern Han Hui, Harvard Wai Hann Lim, Kai-Peng Wang, Yulan Wong, Limsoon Goh, Wilson Wen Bin
format	Article
author	Kong, Weijia Wong, Bertrand Jern Han Hui, Harvard Wai Hann Lim, Kai-Peng Wang, Yulan Wong, Limsoon Goh, Wilson Wen Bin
author_sort	Kong, Weijia
title	ProJect: a powerful mixed-model missing value imputation method
title_short	ProJect: a powerful mixed-model missing value imputation method
title_full	ProJect: a powerful mixed-model missing value imputation method
title_fullStr	ProJect: a powerful mixed-model missing value imputation method
title_full_unstemmed	ProJect: a powerful mixed-model missing value imputation method
title_sort	project: a powerful mixed-model missing value imputation method
publishDate	2023
url	https://hdl.handle.net/10356/171093
_version_	1781793783716773888

ProJect: a powerful mixed-model missing value imputation method

Similar Items