Prediction of protein structural classes for low-homology sequences based on predicted secondary structure
Background: Prediction of protein structural classes (a, b, a + b and a/b) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accurac...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/101219 http://hdl.handle.net/10220/17874 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Background: Prediction of protein structural classes (a, b, a + b and a/b) from amino acid
sequences is of great importance, as it is beneficial to study protein function, regulation and
interactions. Many methods have been developed for high-homology protein sequences, and the
prediction accuracies can achieve up to 90%. However, for low-homology sequences whose
average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly,
yielding the prediction accuracy often below 60%.
Results: We propose a new method to predict protein structural classes on the basis of features
extracted from the predicted secondary structures of proteins rather than directly from their
amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein
sequence. Then, the chaos game representation is employed to represent the predicted secondary
structure as two time series, from which we generate a comprehensive set of 24 features using
recurrence quantification analysis, K-string based information entropy and segment-based analysis. The
resulting feature vectors are finally fed into a simple yet powerful Fisher’s discriminant algorithm
for the prediction of protein structural classes. We tested the proposed method on three
benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%,
83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method
consistently performs better for all the tested datasets and the overall accuracy improvements
range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at
http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/.
Conclusion: The high prediction accuracy achieved by our proposed method is attributed to the
design of a comprehensive feature set on the predicted secondary structure sequences, which is
capable of characterizing the sequence order information, local interactions of the secondary
structural elements, and spacial arrangements of a helices and b strands. Thus, it is a valuable
method to predict protein structural classes particularly for low-homology amino acid sequences. |
---|