Identifying protein complexes from heterogeneous biological data

With the increasing availability of diverse biological information for proteins, integration of heterogeneous data becomes more useful for many problems in proteomics, such as annotating protein functions, predicting novel protein–protein interactions and so on. In this paper, we present an integrat...

Full description

Saved in:
Bibliographic Details
Main Authors: Wu, Min, Xie, Zhipeng, Li, Xiaoli, Kwoh, Chee Keong, Zheng, Jie
Format: Article
Language:English
Published: 2013
Subjects:
Online Access:https://hdl.handle.net/10356/81319
http://hdl.handle.net/10220/18179
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-81319
record_format dspace
spelling sg-ntu-dr.10356-813192020-03-07T11:48:53Z Identifying protein complexes from heterogeneous biological data Wu, Min Xie, Zhipeng Li, Xiaoli Kwoh, Chee Keong Zheng, Jie DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences With the increasing availability of diverse biological information for proteins, integration of heterogeneous data becomes more useful for many problems in proteomics, such as annotating protein functions, predicting novel protein–protein interactions and so on. In this paper, we present an integrative approach called InteHC (Integrative Hierarchical Clustering) to identify protein complexes from multiple data sources. Although integrating multiple sources could effectively improve the coverage of current insufficient protein interactome (the false negative issue), it could also introduce potential false-positive interactions that could hurt the performance of protein complex prediction. Our proposed InteHC method can effectively address these issues to facilitate accurate protein complex prediction and it is summarized into the following three steps. First, for each individual source/feature, InteHC computes the matrices to store the affinity scores between a protein pair that indicate their propensity to interact or co-complex relationship. Second, InteHC computes a final score matrix, which is the weighted sum of affinity scores from individual sources. In particular, the weights indicating the reliability of individual sources are learned from a supervised model (i.e., a linear ranking SVM). Finally, a hierarchical clustering algorithm is performed on the final score matrix to generate clusters as predicted protein complexes. In our experiments, we compared the results collected by our hierarchical clustering on each individual feature with those predicted by InteHC on the combined matrix. We observed that integration of heterogeneous data significantly benefits the identification of protein complexes. Moreover, a comprehensive comparison demonstrates that InteHC performs much better than 14 state-of-the-art approaches. All the experimental data and results can be downloaded from http://www.ntu.edu.sg/home/zhengjie/data/InteHC. MOE (Min. of Education, S’pore) Published version 2013-12-09T10:21:09Z 2019-12-06T14:28:20Z 2013-12-09T10:21:09Z 2019-12-06T14:28:20Z 2013 2013 Journal Article Wu, M., Xie, Z., Li, X., Kwoh, C. K., & Zheng, J. (2013). Identifying protein complexes from heterogeneous biological data. Proteins: Structure, Function, and Bioinformatics, 81(11), 2023-2033. 0887-3585 https://hdl.handle.net/10356/81319 http://hdl.handle.net/10220/18179 10.1002/prot.24365 en Proteins: Structure, Function, and Bioinformatics © 2013 Wiley Periodicals, Inc. This paper was published in Proteins: Structure, Function, and Bioinformatics and is made available as an electronic reprint (preprint) with permission of Wiley Periodicals, Inc. The paper can be found at the following official DOI: http://dx.doi.org/10.1002/prot.24365. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law. 12 p. application/pdf application/pdf application/octet-stream application/octet-stream application/octet-stream text/plain application/octet-stream text/plain text/plain
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences
spellingShingle DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences
Wu, Min
Xie, Zhipeng
Li, Xiaoli
Kwoh, Chee Keong
Zheng, Jie
Identifying protein complexes from heterogeneous biological data
description With the increasing availability of diverse biological information for proteins, integration of heterogeneous data becomes more useful for many problems in proteomics, such as annotating protein functions, predicting novel protein–protein interactions and so on. In this paper, we present an integrative approach called InteHC (Integrative Hierarchical Clustering) to identify protein complexes from multiple data sources. Although integrating multiple sources could effectively improve the coverage of current insufficient protein interactome (the false negative issue), it could also introduce potential false-positive interactions that could hurt the performance of protein complex prediction. Our proposed InteHC method can effectively address these issues to facilitate accurate protein complex prediction and it is summarized into the following three steps. First, for each individual source/feature, InteHC computes the matrices to store the affinity scores between a protein pair that indicate their propensity to interact or co-complex relationship. Second, InteHC computes a final score matrix, which is the weighted sum of affinity scores from individual sources. In particular, the weights indicating the reliability of individual sources are learned from a supervised model (i.e., a linear ranking SVM). Finally, a hierarchical clustering algorithm is performed on the final score matrix to generate clusters as predicted protein complexes. In our experiments, we compared the results collected by our hierarchical clustering on each individual feature with those predicted by InteHC on the combined matrix. We observed that integration of heterogeneous data significantly benefits the identification of protein complexes. Moreover, a comprehensive comparison demonstrates that InteHC performs much better than 14 state-of-the-art approaches. All the experimental data and results can be downloaded from http://www.ntu.edu.sg/home/zhengjie/data/InteHC.
format Article
author Wu, Min
Xie, Zhipeng
Li, Xiaoli
Kwoh, Chee Keong
Zheng, Jie
author_facet Wu, Min
Xie, Zhipeng
Li, Xiaoli
Kwoh, Chee Keong
Zheng, Jie
author_sort Wu, Min
title Identifying protein complexes from heterogeneous biological data
title_short Identifying protein complexes from heterogeneous biological data
title_full Identifying protein complexes from heterogeneous biological data
title_fullStr Identifying protein complexes from heterogeneous biological data
title_full_unstemmed Identifying protein complexes from heterogeneous biological data
title_sort identifying protein complexes from heterogeneous biological data
publishDate 2013
url https://hdl.handle.net/10356/81319
http://hdl.handle.net/10220/18179
_version_ 1681036549512232960