Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers

Background: The lack of consensus among reported gene signature subsets (GSSs) in multi-gene biomarker discovery studies is often a concern for researchers and clinicians. Subsequently, it discourages larger scale prospective studies, prevents the translation of such knowledge into a practical clini...

Full description

Saved in:
Bibliographic Details
Main Authors: Ow, Ghim Siong, Kuznetsov, Vladimir Andreevich
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2018
Subjects:
Online Access:https://hdl.handle.net/10356/88541
http://hdl.handle.net/10220/45907
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-88541
record_format dspace
spelling sg-ntu-dr.10356-885412022-02-16T16:27:19Z Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers Ow, Ghim Siong Kuznetsov, Vladimir Andreevich School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering Breast Cancer Ovarian Cancer Background: The lack of consensus among reported gene signature subsets (GSSs) in multi-gene biomarker discovery studies is often a concern for researchers and clinicians. Subsequently, it discourages larger scale prospective studies, prevents the translation of such knowledge into a practical clinical setting and ultimately hinders the progress of the field of biomarker-based disease classification, prognosis and prediction. Methods: We define all "gene identificators" (gIDs) as constituents of the entire potential disease biomarker space. For each gID in a GSS of interest ("tested GSS"/tGSS), our method counts the empirical frequency of gID co-occurrences/overlaps in other reference GSSs (rGSSs) and compares it with the expected frequency generated via implementation of a randomized sampling procedure. Comparison of the empirical frequency distribution (EFD) with the expected background frequency distribution (BFD) allows dichotomization of statistically novel (SN) and common (SC) gIDs within the tGSS. Results: We identify SN or SC biomarkers for tGSSs obtained from previous studies of high-grade serous ovarian cancer (HG-SOC) and breast cancer (BC). For each tGSS, the EFD of gID co-occurrences/overlaps with other rGSSs is characterized by scale and context-dependent Pareto-like frequency distribution function. Our results indicate that while independently there is little overlap between our tGSS with individual rGSSs, comparison of the EFD with BFD suggests that beyond a confidence threshold, tested gIDs become more common in rGSSs than expected. This validates the use of our tGSS as individual or combined prognostic factors. Our method identifies SN and SC genes of a 36-gene prognostic signature that stratify HG-SOC patients into subgroups with low, intermediate or high-risk of the disease outcome. Using 70 BC rGSSs, the method also predicted SN and SC BC prognostic genes from the tested obesity and IGF1 pathway GSSs. Conclusions: Our method provides a strategy that identify/predict within a tGSS of interest, gID subsets that are either SN or SC when compared to other rGSSs. Practically, our results suggest that there is a stronger association of the IGF1 signature genes with the 70 BC rGSSs, than for the obesity-associated signature. Furthermore, both SC and SN genes, in both signatures could be considered as perspective prognostic biomarkers of BCs that stratify the patients onto low or high risks of cancer development. ASTAR (Agency for Sci., Tech. and Research, S’pore) Published version 2018-09-10T05:52:08Z 2019-12-06T17:05:38Z 2018-09-10T05:52:08Z 2019-12-06T17:05:38Z 2015 Journal Article Ow, G. S., & Kuznetsov, V. A. (2015). Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers. BMC Genomics, 16(Suppl 7), S2-. doi:10.1186/1471-2164-16-S7-S2 https://hdl.handle.net/10356/88541 http://hdl.handle.net/10220/45907 10.1186/1471-2164-16-S7-S2 26100469 en BMC Genomics © 2015 Ow and Kuznetsov; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. 14 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
Breast Cancer
Ovarian Cancer
spellingShingle DRNTU::Engineering::Computer science and engineering
Breast Cancer
Ovarian Cancer
Ow, Ghim Siong
Kuznetsov, Vladimir Andreevich
Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers
description Background: The lack of consensus among reported gene signature subsets (GSSs) in multi-gene biomarker discovery studies is often a concern for researchers and clinicians. Subsequently, it discourages larger scale prospective studies, prevents the translation of such knowledge into a practical clinical setting and ultimately hinders the progress of the field of biomarker-based disease classification, prognosis and prediction. Methods: We define all "gene identificators" (gIDs) as constituents of the entire potential disease biomarker space. For each gID in a GSS of interest ("tested GSS"/tGSS), our method counts the empirical frequency of gID co-occurrences/overlaps in other reference GSSs (rGSSs) and compares it with the expected frequency generated via implementation of a randomized sampling procedure. Comparison of the empirical frequency distribution (EFD) with the expected background frequency distribution (BFD) allows dichotomization of statistically novel (SN) and common (SC) gIDs within the tGSS. Results: We identify SN or SC biomarkers for tGSSs obtained from previous studies of high-grade serous ovarian cancer (HG-SOC) and breast cancer (BC). For each tGSS, the EFD of gID co-occurrences/overlaps with other rGSSs is characterized by scale and context-dependent Pareto-like frequency distribution function. Our results indicate that while independently there is little overlap between our tGSS with individual rGSSs, comparison of the EFD with BFD suggests that beyond a confidence threshold, tested gIDs become more common in rGSSs than expected. This validates the use of our tGSS as individual or combined prognostic factors. Our method identifies SN and SC genes of a 36-gene prognostic signature that stratify HG-SOC patients into subgroups with low, intermediate or high-risk of the disease outcome. Using 70 BC rGSSs, the method also predicted SN and SC BC prognostic genes from the tested obesity and IGF1 pathway GSSs. Conclusions: Our method provides a strategy that identify/predict within a tGSS of interest, gID subsets that are either SN or SC when compared to other rGSSs. Practically, our results suggest that there is a stronger association of the IGF1 signature genes with the 70 BC rGSSs, than for the obesity-associated signature. Furthermore, both SC and SN genes, in both signatures could be considered as perspective prognostic biomarkers of BCs that stratify the patients onto low or high risks of cancer development.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Ow, Ghim Siong
Kuznetsov, Vladimir Andreevich
format Article
author Ow, Ghim Siong
Kuznetsov, Vladimir Andreevich
author_sort Ow, Ghim Siong
title Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers
title_short Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers
title_full Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers
title_fullStr Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers
title_full_unstemmed Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers
title_sort multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers
publishDate 2018
url https://hdl.handle.net/10356/88541
http://hdl.handle.net/10220/45907
_version_ 1725985790313365504