Integrating node embeddings and biological annotations for genes to predict disease-gene associations

Background: Predicting disease causative genes (or simply, disease genes) has played critical roles in understandingthe genetic basis of human diseases and further providing disease treatment guidelines. While various computationalmethods have been proposed for disease gene prediction, with the rece...

Full description

Saved in:

Bibliographic Details
Main Authors:	ATA, Sezin Kircali, OU-YANG, Le, FANG, Yuan, KWOH, Chee-Keong, WU, Min, LI, Xiao-Li
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2018
Subjects:	Disease gene prediction Node embeddings Feature learning Oversampling Protein-protein interaction Databases and Information Systems Systems Biology
Online Access:	https://ink.library.smu.edu.sg/sis_research/4281 https://ink.library.smu.edu.sg/context/sis_research/article/5284/viewcontent/s12918_018_0662_y.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5284
record_format	dspace
spelling	sg-smu-ink.sis_research-52842019-02-21T08:26:21Z Integrating node embeddings and biological annotations for genes to predict disease-gene associations ATA, Sezin Kircali OU-YANG, Le FANG, Yuan KWOH, Chee-Keong WU, Min LI, Xiao-Li Background: Predicting disease causative genes (or simply, disease genes) has played critical roles in understandingthe genetic basis of human diseases and further providing disease treatment guidelines. While various computationalmethods have been proposed for disease gene prediction, with the recent increasing availability of biologicalinformation for genes, it is highly motivated to leverage these valuable data sources and extract useful information foraccurately predicting disease genes. Results: We present an integrative framework called N2VKO to predict disease genes. Firstly, we learn the nodeembeddings from protein-protein interaction (PPI) network for genes by adapting the well-known representationlearning method node2vec. Secondly, we combine the learned node embeddings with various biological annotationsas rich feature representation for genes, and subsequently build binary classification models for disease geneprediction. Finally, as the data for disease gene prediction is usually imbalanced (i.e. the number of the causativegenes for a specific disease is much less than that of its non-causative genes), we further address this serious dataimbalance issue by applying oversampling techniques for imbalance data correction to improve the predictionperformance. Comprehensive experiments demonstrate that our proposed N2VKO significantly outperforms fourstate-of-the-art methods for disease gene prediction across seven diseases. Conclusions: In this study, we show that node embeddings learned from PPI networks work well for disease geneprediction, while integrating node embeddings with other biological annotations further improves the performanceof classification models. Moreover, oversampling techniques for imbalance correction further enhances the predictionperformance. In addition, the literature search of predicted disease genes also shows the effectiveness of ourproposed N2VKO framework for disease gene prediction. 2018-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4281 info:doi/10.1186/s12918-018-0662-y https://ink.library.smu.edu.sg/context/sis_research/article/5284/viewcontent/s12918_018_0662_y.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Disease gene prediction Node embeddings Feature learning Oversampling Protein-protein interaction Databases and Information Systems Systems Biology
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Disease gene prediction Node embeddings Feature learning Oversampling Protein-protein interaction Databases and Information Systems Systems Biology
spellingShingle	Disease gene prediction Node embeddings Feature learning Oversampling Protein-protein interaction Databases and Information Systems Systems Biology ATA, Sezin Kircali OU-YANG, Le FANG, Yuan KWOH, Chee-Keong WU, Min LI, Xiao-Li Integrating node embeddings and biological annotations for genes to predict disease-gene associations
description	Background: Predicting disease causative genes (or simply, disease genes) has played critical roles in understandingthe genetic basis of human diseases and further providing disease treatment guidelines. While various computationalmethods have been proposed for disease gene prediction, with the recent increasing availability of biologicalinformation for genes, it is highly motivated to leverage these valuable data sources and extract useful information foraccurately predicting disease genes. Results: We present an integrative framework called N2VKO to predict disease genes. Firstly, we learn the nodeembeddings from protein-protein interaction (PPI) network for genes by adapting the well-known representationlearning method node2vec. Secondly, we combine the learned node embeddings with various biological annotationsas rich feature representation for genes, and subsequently build binary classification models for disease geneprediction. Finally, as the data for disease gene prediction is usually imbalanced (i.e. the number of the causativegenes for a specific disease is much less than that of its non-causative genes), we further address this serious dataimbalance issue by applying oversampling techniques for imbalance data correction to improve the predictionperformance. Comprehensive experiments demonstrate that our proposed N2VKO significantly outperforms fourstate-of-the-art methods for disease gene prediction across seven diseases. Conclusions: In this study, we show that node embeddings learned from PPI networks work well for disease geneprediction, while integrating node embeddings with other biological annotations further improves the performanceof classification models. Moreover, oversampling techniques for imbalance correction further enhances the predictionperformance. In addition, the literature search of predicted disease genes also shows the effectiveness of ourproposed N2VKO framework for disease gene prediction.
format	text
author	ATA, Sezin Kircali OU-YANG, Le FANG, Yuan KWOH, Chee-Keong WU, Min LI, Xiao-Li
author_facet	ATA, Sezin Kircali OU-YANG, Le FANG, Yuan KWOH, Chee-Keong WU, Min LI, Xiao-Li
author_sort	ATA, Sezin Kircali
title	Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_short	Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_full	Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_fullStr	Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_full_unstemmed	Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_sort	integrating node embeddings and biological annotations for genes to predict disease-gene associations
publisher	Institutional Knowledge at Singapore Management University
publishDate	2018
url	https://ink.library.smu.edu.sg/sis_research/4281 https://ink.library.smu.edu.sg/context/sis_research/article/5284/viewcontent/s12918_018_0662_y.pdf
_version_	1770574598459359232

Integrating node embeddings and biological annotations for genes to predict disease-gene associations

Similar Items