Integrating node embeddings and biological annotations for genes to predict disease-gene associations

Background : Predicting disease causative genes (or simply, disease genes) has played critical roles in understanding the genetic basis of human diseases and further providing disease treatment guidelines. While various computational methods have been proposed for disease gene prediction, with the r...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ata, Sezin Kircali, Ou-Yang, Le, Fang, Yuan, Kwoh, Chee-Keong, Wu, Min, Li, Xiao-Li
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2019
Subjects:	DRNTU::Engineering::Computer science and engineering Disease Gene Prediction Node Embeddings
Online Access:	https://hdl.handle.net/10356/105988 http://hdl.handle.net/10220/48817 http://dx.doi.org/10.1186/s12918-018-0662-y
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-105988
record_format	dspace
spelling	sg-ntu-dr.10356-1059882019-12-06T22:02:19Z Integrating node embeddings and biological annotations for genes to predict disease-gene associations Ata, Sezin Kircali Ou-Yang, Le Fang, Yuan Kwoh, Chee-Keong Wu, Min Li, Xiao-Li School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering Disease Gene Prediction Node Embeddings Background : Predicting disease causative genes (or simply, disease genes) has played critical roles in understanding the genetic basis of human diseases and further providing disease treatment guidelines. While various computational methods have been proposed for disease gene prediction, with the recent increasing availability of biological information for genes, it is highly motivated to leverage these valuable data sources and extract useful information for accurately predicting disease genes. Results : We present an integrative framework called N2VKO to predict disease genes. Firstly, we learn the node embeddings from protein-protein interaction (PPI) network for genes by adapting the well-known representation learning method node2vec. Secondly, we combine the learned node embeddings with various biological annotations as rich feature representation for genes, and subsequently build binary classification models for disease gene prediction. Finally, as the data for disease gene prediction is usually imbalanced (i.e. the number of the causative genes for a specific disease is much less than that of its non-causative genes), we further address this serious data imbalance issue by applying oversampling techniques for imbalance data correction to improve the prediction performance. Comprehensive experiments demonstrate that our proposed N2VKO significantly outperforms four state-of-the-art methods for disease gene prediction across seven diseases. Conclusions : In this study, we show that node embeddings learned from PPI networks work well for disease geneprediction, while integrating node embeddings with other biological annotations further improves the performanceof classification models. Moreover, oversampling techniques for imbalance correction further enhances the prediction performance. In addition, the literature search of predicted disease genes also shows the effectiveness of our proposed N2VKO framework for disease gene prediction. MOE (Min. of Education, S’pore) Published version 2019-06-19T02:57:35Z 2019-12-06T22:02:19Z 2019-06-19T02:57:35Z 2019-12-06T22:02:19Z 2018 Journal Article Ata, S. K., Ou-Yang, L., Fang, Y., Kwoh, C.-K., Wu, M., & Li, X.-L. (2018). Integrating node embeddings and biological annotations for genes to predict disease-gene associations. BMC Systems Biology, 12(S9), 138-. doi:10.1186/s12918-018-0662-y https://hdl.handle.net/10356/105988 http://hdl.handle.net/10220/48817 http://dx.doi.org/10.1186/s12918-018-0662-y en BMC Systems Biology © 2018 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated 14 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering Disease Gene Prediction Node Embeddings
spellingShingle	DRNTU::Engineering::Computer science and engineering Disease Gene Prediction Node Embeddings Ata, Sezin Kircali Ou-Yang, Le Fang, Yuan Kwoh, Chee-Keong Wu, Min Li, Xiao-Li Integrating node embeddings and biological annotations for genes to predict disease-gene associations
description	Background : Predicting disease causative genes (or simply, disease genes) has played critical roles in understanding the genetic basis of human diseases and further providing disease treatment guidelines. While various computational methods have been proposed for disease gene prediction, with the recent increasing availability of biological information for genes, it is highly motivated to leverage these valuable data sources and extract useful information for accurately predicting disease genes. Results : We present an integrative framework called N2VKO to predict disease genes. Firstly, we learn the node embeddings from protein-protein interaction (PPI) network for genes by adapting the well-known representation learning method node2vec. Secondly, we combine the learned node embeddings with various biological annotations as rich feature representation for genes, and subsequently build binary classification models for disease gene prediction. Finally, as the data for disease gene prediction is usually imbalanced (i.e. the number of the causative genes for a specific disease is much less than that of its non-causative genes), we further address this serious data imbalance issue by applying oversampling techniques for imbalance data correction to improve the prediction performance. Comprehensive experiments demonstrate that our proposed N2VKO significantly outperforms four state-of-the-art methods for disease gene prediction across seven diseases. Conclusions : In this study, we show that node embeddings learned from PPI networks work well for disease geneprediction, while integrating node embeddings with other biological annotations further improves the performanceof classification models. Moreover, oversampling techniques for imbalance correction further enhances the prediction performance. In addition, the literature search of predicted disease genes also shows the effectiveness of our proposed N2VKO framework for disease gene prediction.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Ata, Sezin Kircali Ou-Yang, Le Fang, Yuan Kwoh, Chee-Keong Wu, Min Li, Xiao-Li
format	Article
author	Ata, Sezin Kircali Ou-Yang, Le Fang, Yuan Kwoh, Chee-Keong Wu, Min Li, Xiao-Li
author_sort	Ata, Sezin Kircali
title	Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_short	Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_full	Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_fullStr	Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_full_unstemmed	Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_sort	integrating node embeddings and biological annotations for genes to predict disease-gene associations
publishDate	2019
url	https://hdl.handle.net/10356/105988 http://hdl.handle.net/10220/48817 http://dx.doi.org/10.1186/s12918-018-0662-y
_version_	1681042909188587520

Integrating node embeddings and biological annotations for genes to predict disease-gene associations

Similar Items