Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers
Background: The lack of consensus among reported gene signature subsets (GSSs) in multi-gene biomarker discovery studies is often a concern for researchers and clinicians. Subsequently, it discourages larger scale prospective studies, prevents the translation of such knowledge into a practical clini...
Saved in:
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/88541 http://hdl.handle.net/10220/45907 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-88541 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-885412022-02-16T16:27:19Z Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers Ow, Ghim Siong Kuznetsov, Vladimir Andreevich School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering Breast Cancer Ovarian Cancer Background: The lack of consensus among reported gene signature subsets (GSSs) in multi-gene biomarker discovery studies is often a concern for researchers and clinicians. Subsequently, it discourages larger scale prospective studies, prevents the translation of such knowledge into a practical clinical setting and ultimately hinders the progress of the field of biomarker-based disease classification, prognosis and prediction. Methods: We define all "gene identificators" (gIDs) as constituents of the entire potential disease biomarker space. For each gID in a GSS of interest ("tested GSS"/tGSS), our method counts the empirical frequency of gID co-occurrences/overlaps in other reference GSSs (rGSSs) and compares it with the expected frequency generated via implementation of a randomized sampling procedure. Comparison of the empirical frequency distribution (EFD) with the expected background frequency distribution (BFD) allows dichotomization of statistically novel (SN) and common (SC) gIDs within the tGSS. Results: We identify SN or SC biomarkers for tGSSs obtained from previous studies of high-grade serous ovarian cancer (HG-SOC) and breast cancer (BC). For each tGSS, the EFD of gID co-occurrences/overlaps with other rGSSs is characterized by scale and context-dependent Pareto-like frequency distribution function. Our results indicate that while independently there is little overlap between our tGSS with individual rGSSs, comparison of the EFD with BFD suggests that beyond a confidence threshold, tested gIDs become more common in rGSSs than expected. This validates the use of our tGSS as individual or combined prognostic factors. Our method identifies SN and SC genes of a 36-gene prognostic signature that stratify HG-SOC patients into subgroups with low, intermediate or high-risk of the disease outcome. Using 70 BC rGSSs, the method also predicted SN and SC BC prognostic genes from the tested obesity and IGF1 pathway GSSs. Conclusions: Our method provides a strategy that identify/predict within a tGSS of interest, gID subsets that are either SN or SC when compared to other rGSSs. Practically, our results suggest that there is a stronger association of the IGF1 signature genes with the 70 BC rGSSs, than for the obesity-associated signature. Furthermore, both SC and SN genes, in both signatures could be considered as perspective prognostic biomarkers of BCs that stratify the patients onto low or high risks of cancer development. ASTAR (Agency for Sci., Tech. and Research, S’pore) Published version 2018-09-10T05:52:08Z 2019-12-06T17:05:38Z 2018-09-10T05:52:08Z 2019-12-06T17:05:38Z 2015 Journal Article Ow, G. S., & Kuznetsov, V. A. (2015). Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers. BMC Genomics, 16(Suppl 7), S2-. doi:10.1186/1471-2164-16-S7-S2 https://hdl.handle.net/10356/88541 http://hdl.handle.net/10220/45907 10.1186/1471-2164-16-S7-S2 26100469 en BMC Genomics © 2015 Ow and Kuznetsov; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. 14 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering Breast Cancer Ovarian Cancer |
spellingShingle |
DRNTU::Engineering::Computer science and engineering Breast Cancer Ovarian Cancer Ow, Ghim Siong Kuznetsov, Vladimir Andreevich Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers |
description |
Background: The lack of consensus among reported gene signature subsets (GSSs) in multi-gene biomarker discovery studies is often a concern for researchers and clinicians. Subsequently, it discourages larger scale prospective studies, prevents the translation of such knowledge into a practical clinical setting and ultimately hinders the progress of the field of biomarker-based disease classification, prognosis and prediction. Methods: We define all "gene identificators" (gIDs) as constituents of the entire potential disease biomarker space. For each gID in a GSS of interest ("tested GSS"/tGSS), our method counts the empirical frequency of gID co-occurrences/overlaps in other reference GSSs (rGSSs) and compares it with the expected frequency generated via implementation of a randomized sampling procedure. Comparison of the empirical frequency distribution (EFD) with the expected background frequency distribution (BFD) allows dichotomization of statistically novel (SN) and common (SC) gIDs within the tGSS. Results: We identify SN or SC biomarkers for tGSSs obtained from previous studies of high-grade serous ovarian cancer (HG-SOC) and breast cancer (BC). For each tGSS, the EFD of gID co-occurrences/overlaps with other rGSSs is characterized by scale and context-dependent Pareto-like frequency distribution function. Our results indicate that while independently there is little overlap between our tGSS with individual rGSSs, comparison of the EFD with BFD suggests that beyond a confidence threshold, tested gIDs become more common in rGSSs than expected. This validates the use of our tGSS as individual or combined prognostic factors. Our method identifies SN and SC genes of a 36-gene prognostic signature that stratify HG-SOC patients into subgroups with low, intermediate or high-risk of the disease outcome. Using 70 BC rGSSs, the method also predicted SN and SC BC prognostic genes from the tested obesity and IGF1 pathway GSSs. Conclusions: Our method provides a strategy that identify/predict within a tGSS of interest, gID subsets that are either SN or SC when compared to other rGSSs. Practically, our results suggest that there is a stronger association of the IGF1 signature genes with the 70 BC rGSSs, than for the obesity-associated signature. Furthermore, both SC and SN genes, in both signatures could be considered as perspective prognostic biomarkers of BCs that stratify the patients onto low or high risks of cancer development. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Ow, Ghim Siong Kuznetsov, Vladimir Andreevich |
format |
Article |
author |
Ow, Ghim Siong Kuznetsov, Vladimir Andreevich |
author_sort |
Ow, Ghim Siong |
title |
Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers |
title_short |
Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers |
title_full |
Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers |
title_fullStr |
Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers |
title_full_unstemmed |
Multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers |
title_sort |
multiple signatures of a disease in potential biomarker space : getting the signatures consensus and identification of novel biomarkers |
publishDate |
2018 |
url |
https://hdl.handle.net/10356/88541 http://hdl.handle.net/10220/45907 |
_version_ |
1725985790313365504 |