Disease gene classification with metagraph representations

Protein-protein interaction (PPI) networks play an important role in studying the functional roles of proteins, including their association with diseases. However, protein interaction networks are not sufficient without the support of additional biological knowledge for proteins such as their molecu...

Full description

Saved in:
Bibliographic Details
Main Authors: KIRCALI ATA, Sezin, FANG, Yuan, WU, Min, LI, Xiao-Li, XIAO, Xiaokui
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4068
https://ink.library.smu.edu.sg/context/sis_research/article/5071/viewcontent/Disease_gene_manuscript.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5071
record_format dspace
spelling sg-smu-ink.sis_research-50712019-02-07T04:15:16Z Disease gene classification with metagraph representations KIRCALI ATA, Sezin FANG, Yuan WU, Min LI, Xiao-Li XIAO, Xiaokui Protein-protein interaction (PPI) networks play an important role in studying the functional roles of proteins, including their association with diseases. However, protein interaction networks are not sufficient without the support of additional biological knowledge for proteins such as their molecular functions and biological processes. To complement and enrich PPI networks, we propose to exploit biological properties of individual proteins. More specifically, we integrate keywords describing protein properties into the PPI network, and construct a novel PPI-Keywords (PPIK) network consisting of both proteins and keywords as two different types of nodes. As disease proteins tend to have a similar topological characteristics on the PPIK network, we further propose to represent proteins with metagraphs. Different from a traditional network motif or subgraph, a metagraph can capture a particular topological arrangement involving the interactions/ associations between both proteins and keywords. Based on the novel metagraph representations for proteins, we further build classifiers for disease protein classification through supervised learning. Our experiments on three different PPI databases demonstrate that the proposed method consistently improves disease protein prediction across various classifiers, by 15.3% in AUC on average. It outperforms the baselines including the diffusion-based methods (e.g., RWR) and the module-based methods by 13.8–32.9% for overall disease protein prediction. For predicting breast cancer genes, it outperforms RWR, PRINCE and the module-based baselines by 6.6–14.2%. Finally, our predictions also turn out to have better correlations with literature findings from PubMed. 2017-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4068 info:doi/10.1016/j.ymeth.2017.06.036 https://ink.library.smu.edu.sg/context/sis_research/article/5071/viewcontent/Disease_gene_manuscript.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Disease protein prediction Metagraph Protein representations Protein-protein interaction Uniprot keywords Databases and Information Systems Medicine and Health Sciences
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Disease protein prediction
Metagraph
Protein representations
Protein-protein interaction
Uniprot keywords
Databases and Information Systems
Medicine and Health Sciences
spellingShingle Disease protein prediction
Metagraph
Protein representations
Protein-protein interaction
Uniprot keywords
Databases and Information Systems
Medicine and Health Sciences
KIRCALI ATA, Sezin
FANG, Yuan
WU, Min
LI, Xiao-Li
XIAO, Xiaokui
Disease gene classification with metagraph representations
description Protein-protein interaction (PPI) networks play an important role in studying the functional roles of proteins, including their association with diseases. However, protein interaction networks are not sufficient without the support of additional biological knowledge for proteins such as their molecular functions and biological processes. To complement and enrich PPI networks, we propose to exploit biological properties of individual proteins. More specifically, we integrate keywords describing protein properties into the PPI network, and construct a novel PPI-Keywords (PPIK) network consisting of both proteins and keywords as two different types of nodes. As disease proteins tend to have a similar topological characteristics on the PPIK network, we further propose to represent proteins with metagraphs. Different from a traditional network motif or subgraph, a metagraph can capture a particular topological arrangement involving the interactions/ associations between both proteins and keywords. Based on the novel metagraph representations for proteins, we further build classifiers for disease protein classification through supervised learning. Our experiments on three different PPI databases demonstrate that the proposed method consistently improves disease protein prediction across various classifiers, by 15.3% in AUC on average. It outperforms the baselines including the diffusion-based methods (e.g., RWR) and the module-based methods by 13.8–32.9% for overall disease protein prediction. For predicting breast cancer genes, it outperforms RWR, PRINCE and the module-based baselines by 6.6–14.2%. Finally, our predictions also turn out to have better correlations with literature findings from PubMed.
format text
author KIRCALI ATA, Sezin
FANG, Yuan
WU, Min
LI, Xiao-Li
XIAO, Xiaokui
author_facet KIRCALI ATA, Sezin
FANG, Yuan
WU, Min
LI, Xiao-Li
XIAO, Xiaokui
author_sort KIRCALI ATA, Sezin
title Disease gene classification with metagraph representations
title_short Disease gene classification with metagraph representations
title_full Disease gene classification with metagraph representations
title_fullStr Disease gene classification with metagraph representations
title_full_unstemmed Disease gene classification with metagraph representations
title_sort disease gene classification with metagraph representations
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/4068
https://ink.library.smu.edu.sg/context/sis_research/article/5071/viewcontent/Disease_gene_manuscript.pdf
_version_ 1770574207788253184