A topological approach for protein classification

Protein function and dynamics are closely related to its sequence and structure.However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity...

Full description

Saved in:
Bibliographic Details
Main Authors: Cang, Zixuan, Mu, Lin, Wu, Kedi, Opron, Kristopher, Xia, Kelin, Wei, Guo-Wei
Other Authors: School of Physical and Mathematical Sciences
Format: Article
Language:English
Published: 2016
Subjects:
Online Access:https://hdl.handle.net/10356/82112
http://hdl.handle.net/10220/41120
http://www.degruyter.com/view/j/mlbmb.2015.3.issue-1/mlbmb-2015-0009/mlbmb-2015-0009.xml?format=INT
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-82112
record_format dspace
spelling sg-ntu-dr.10356-821122023-02-28T19:32:21Z A topological approach for protein classification Cang, Zixuan Mu, Lin Wu, Kedi Opron, Kristopher Xia, Kelin Wei, Guo-Wei School of Physical and Mathematical Sciences persistent homology machine learning Protein function and dynamics are closely related to its sequence and structure.However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics. Persistent homology is a new branch of algebraic topology that has found its success in the topological data analysis in a variety of disciplines, including molecular biology. The present work explores the potential of using persistent homology as an independent tool for protein classification. To this end, we propose a molecular topological fingerprint based support vector machine (MTF-SVM) classifier. Specifically,we construct machine learning feature vectors solely fromprotein topological fingerprints,which are topological invariants generated during the filtration process. To validate the presentMTF-SVMapproach, we consider four types of problems. First, we study protein-drug binding by using the M2 channel protein of influenza A virus. We achieve 96% accuracy in discriminating drug bound and unbound M2 channels. Secondly, we examine the use of MTF-SVM for the classification of hemoglobin molecules in their relaxed and taut forms and obtain about 80% accuracy. Thirdly, the identification of all alpha, all beta, and alpha-beta protein domains is carried out using 900 proteins.We have found a 85% success in this identification. Finally, we apply the present technique to 55 classification tasks of protein superfamilies over 1357 samples and 246 tasks over 11944 samples. Average accuracies of 82% and 73% are attained. The present study establishes computational topology as an independent and effective alternative for protein classification. Published version 2016-08-10T07:49:30Z 2019-12-06T14:46:51Z 2016-08-10T07:49:30Z 2019-12-06T14:46:51Z 2015 Journal Article Cang, Z., Mu, L., Wu, K., Opron, K., Xia, K., & Wei, G.-W. (2015). A topological approach for protein classification. Molecular Based Mathematical Biology, 3(1), 140-162. https://hdl.handle.net/10356/82112 http://hdl.handle.net/10220/41120 http://www.degruyter.com/view/j/mlbmb.2015.3.issue-1/mlbmb-2015-0009/mlbmb-2015-0009.xml?format=INT en Molecular Based Mathematical Biology © 2015 Zixuan Cang et al., licensee De Gruyter Open. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License. 23 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic persistent homology
machine learning
spellingShingle persistent homology
machine learning
Cang, Zixuan
Mu, Lin
Wu, Kedi
Opron, Kristopher
Xia, Kelin
Wei, Guo-Wei
A topological approach for protein classification
description Protein function and dynamics are closely related to its sequence and structure.However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics. Persistent homology is a new branch of algebraic topology that has found its success in the topological data analysis in a variety of disciplines, including molecular biology. The present work explores the potential of using persistent homology as an independent tool for protein classification. To this end, we propose a molecular topological fingerprint based support vector machine (MTF-SVM) classifier. Specifically,we construct machine learning feature vectors solely fromprotein topological fingerprints,which are topological invariants generated during the filtration process. To validate the presentMTF-SVMapproach, we consider four types of problems. First, we study protein-drug binding by using the M2 channel protein of influenza A virus. We achieve 96% accuracy in discriminating drug bound and unbound M2 channels. Secondly, we examine the use of MTF-SVM for the classification of hemoglobin molecules in their relaxed and taut forms and obtain about 80% accuracy. Thirdly, the identification of all alpha, all beta, and alpha-beta protein domains is carried out using 900 proteins.We have found a 85% success in this identification. Finally, we apply the present technique to 55 classification tasks of protein superfamilies over 1357 samples and 246 tasks over 11944 samples. Average accuracies of 82% and 73% are attained. The present study establishes computational topology as an independent and effective alternative for protein classification.
author2 School of Physical and Mathematical Sciences
author_facet School of Physical and Mathematical Sciences
Cang, Zixuan
Mu, Lin
Wu, Kedi
Opron, Kristopher
Xia, Kelin
Wei, Guo-Wei
format Article
author Cang, Zixuan
Mu, Lin
Wu, Kedi
Opron, Kristopher
Xia, Kelin
Wei, Guo-Wei
author_sort Cang, Zixuan
title A topological approach for protein classification
title_short A topological approach for protein classification
title_full A topological approach for protein classification
title_fullStr A topological approach for protein classification
title_full_unstemmed A topological approach for protein classification
title_sort topological approach for protein classification
publishDate 2016
url https://hdl.handle.net/10356/82112
http://hdl.handle.net/10220/41120
http://www.degruyter.com/view/j/mlbmb.2015.3.issue-1/mlbmb-2015-0009/mlbmb-2015-0009.xml?format=INT
_version_ 1759856024662048768