Concordance-based batch effect correction for large-scale metabolomics

For a large-scale metabolomics study, sample collection, preparation, and analysis may last several days, months, or even (intermittently) over years. This may lead to apparent batch effects in the acquired metabolomics data due to variability in instrument status, environmental conditions, or exper...

Full description

Saved in:
Bibliographic Details
Main Authors: Guo, Fanjing, Lin, Genjin, Dong, Liheng, Cheng, Kian-Kai, Deng, Lingli, Xu, Xiangnan, Raftery, Daniel, Dong, Jiyang
Format: Article
Published: American Chemical Society 2023
Subjects:
Online Access:http://eprints.utm.my/105012/
http://dx.doi.org/10.1021/acs.analchem.2c05748
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
id my.utm.105012
record_format eprints
spelling my.utm.1050122024-04-01T07:45:36Z http://eprints.utm.my/105012/ Concordance-based batch effect correction for large-scale metabolomics Guo, Fanjing Lin, Genjin Dong, Liheng Cheng, Kian-Kai Deng, Lingli Xu, Xiangnan Raftery, Daniel Dong, Jiyang TP Chemical technology For a large-scale metabolomics study, sample collection, preparation, and analysis may last several days, months, or even (intermittently) over years. This may lead to apparent batch effects in the acquired metabolomics data due to variability in instrument status, environmental conditions, or experimental operators. Batch effects may confound the true biological relationships among metabolites and thus obscure real metabolic changes. At present, most of the commonly used batch effect correction (BEC) methods are based on quality control (QC) samples, which require sufficient and stable QC samples. However, the quality of the QC samples may deteriorate if the experiment lasts for a long time. Alternatively, isotope-labeled internal standards have been used, but they generally do not provide good coverage of the metabolome. On the other hand, BEC can also be conducted through a data-driven method, in which no QC sample is needed. Here, we propose a novel data-driven BEC method, namely, CordBat, to achieve concordance between each batch of samples. In the proposed CordBat method, a reference batch is first selected from all batches of data, and the remaining batches are referred to as “other batches.” The reference batch serves as the baseline for the batch adjustment by providing a coordinate of correlation between metabolites. Next, a Gaussian graphical model is built on the combined dataset of reference and other batches, and finally, BEC is achieved by optimizing the correction coefficients in the other batches so that the correlation between metabolites of each batch and their combinations are in concordance with that of the reference batch. Three real-world metabolomics datasets are used to evaluate the performance of CordBat by comparing it with five commonly used BEC methods. The present experimental results showed the effectiveness of CordBat in batch effect removal and the concordance of correlation between metabolites after BEC. CordBat was found to be comparable to the QC-based methods and achieved better performance in the preservation of biological effects. The proposed CordBat method may serve as an alternative BEC method for large-scale metabolomics that lack proper QC samples. American Chemical Society 2023 Article PeerReviewed Guo, Fanjing and Lin, Genjin and Dong, Liheng and Cheng, Kian-Kai and Deng, Lingli and Xu, Xiangnan and Raftery, Daniel and Dong, Jiyang (2023) Concordance-based batch effect correction for large-scale metabolomics. Analytical Chemistry, 95 (18). pp. 7220-7228. ISSN 0003-2700 http://dx.doi.org/10.1021/acs.analchem.2c05748 DOI : 10.1021/acs.analchem.2c05748
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic TP Chemical technology
spellingShingle TP Chemical technology
Guo, Fanjing
Lin, Genjin
Dong, Liheng
Cheng, Kian-Kai
Deng, Lingli
Xu, Xiangnan
Raftery, Daniel
Dong, Jiyang
Concordance-based batch effect correction for large-scale metabolomics
description For a large-scale metabolomics study, sample collection, preparation, and analysis may last several days, months, or even (intermittently) over years. This may lead to apparent batch effects in the acquired metabolomics data due to variability in instrument status, environmental conditions, or experimental operators. Batch effects may confound the true biological relationships among metabolites and thus obscure real metabolic changes. At present, most of the commonly used batch effect correction (BEC) methods are based on quality control (QC) samples, which require sufficient and stable QC samples. However, the quality of the QC samples may deteriorate if the experiment lasts for a long time. Alternatively, isotope-labeled internal standards have been used, but they generally do not provide good coverage of the metabolome. On the other hand, BEC can also be conducted through a data-driven method, in which no QC sample is needed. Here, we propose a novel data-driven BEC method, namely, CordBat, to achieve concordance between each batch of samples. In the proposed CordBat method, a reference batch is first selected from all batches of data, and the remaining batches are referred to as “other batches.” The reference batch serves as the baseline for the batch adjustment by providing a coordinate of correlation between metabolites. Next, a Gaussian graphical model is built on the combined dataset of reference and other batches, and finally, BEC is achieved by optimizing the correction coefficients in the other batches so that the correlation between metabolites of each batch and their combinations are in concordance with that of the reference batch. Three real-world metabolomics datasets are used to evaluate the performance of CordBat by comparing it with five commonly used BEC methods. The present experimental results showed the effectiveness of CordBat in batch effect removal and the concordance of correlation between metabolites after BEC. CordBat was found to be comparable to the QC-based methods and achieved better performance in the preservation of biological effects. The proposed CordBat method may serve as an alternative BEC method for large-scale metabolomics that lack proper QC samples.
format Article
author Guo, Fanjing
Lin, Genjin
Dong, Liheng
Cheng, Kian-Kai
Deng, Lingli
Xu, Xiangnan
Raftery, Daniel
Dong, Jiyang
author_facet Guo, Fanjing
Lin, Genjin
Dong, Liheng
Cheng, Kian-Kai
Deng, Lingli
Xu, Xiangnan
Raftery, Daniel
Dong, Jiyang
author_sort Guo, Fanjing
title Concordance-based batch effect correction for large-scale metabolomics
title_short Concordance-based batch effect correction for large-scale metabolomics
title_full Concordance-based batch effect correction for large-scale metabolomics
title_fullStr Concordance-based batch effect correction for large-scale metabolomics
title_full_unstemmed Concordance-based batch effect correction for large-scale metabolomics
title_sort concordance-based batch effect correction for large-scale metabolomics
publisher American Chemical Society
publishDate 2023
url http://eprints.utm.my/105012/
http://dx.doi.org/10.1021/acs.analchem.2c05748
_version_ 1797905736519385088