Improving taxonomy-based protein fold recognition by using global and local features

Fold recognition from amino acid sequences plays an important role in identifying protein structures and functions. The taxonomy-based method, which classifies a query protein into one of the known folds, has been shown very promising for protein fold recognition. However, extracting a set of highly...

Full description

Saved in:
Bibliographic Details
Main Authors: Yang, Jian-Yi, Chen, Xin
Other Authors: School of Physical and Mathematical Sciences
Format: Article
Language:English
Published: 2013
Subjects:
Online Access:https://hdl.handle.net/10356/100442
http://hdl.handle.net/10220/17877
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Fold recognition from amino acid sequences plays an important role in identifying protein structures and functions. The taxonomy-based method, which classifies a query protein into one of the known folds, has been shown very promising for protein fold recognition. However, extracting a set of highly discriminative features from amino acid sequences remains a challenging problem. To address this problem, we developed a new taxonomy-based protein fold recognition method called TAXFOLD. It extensively exploits the sequence evolution information from PSI-BLAST profiles and the secondary structure information from PSIPRED profiles. A comprehensive set of 137 features is constructed, which allows for the depiction of both global and local characteristics of PSI-BLAST and PSIPRED profiles. We tested TAXFOLD on four datasets and compared it with several major existing taxonomic methods for fold recognition. Its recognition accuracies range from 79.6 to 90% for 27, 95, and 194 folds, achieving an average 6.9% improvement over the best available taxonomic method. Further test on the Lindahl benchmark dataset shows that TAXFOLD is comparable with the best conventional template-based threading method at the SCOP fold level. These experimental results demonstrate that the proposed set of features is highly beneficial to protein fold recognition.