Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

Background: Prediction of protein structural classes (a, b, a + b and a/b) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accurac...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yang, Jian-Yi, Peng, Zhen-Ling, Chen, Xin
Other Authors:	School of Physical and Mathematical Sciences
Format:	Article
Language:	English
Published:	2013
Subjects:	Mathematical Sciences
Online Access:	https://hdl.handle.net/10356/101219 http://hdl.handle.net/10220/17874
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-101219
record_format	dspace
spelling	sg-ntu-dr.10356-1012192023-02-28T19:34:14Z Prediction of protein structural classes for low-homology sequences based on predicted secondary structure Yang, Jian-Yi Peng, Zhen-Ling Chen, Xin School of Physical and Mathematical Sciences Mathematical Sciences Background: Prediction of protein structural classes (a, b, a + b and a/b) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accuracies can achieve up to 90%. However, for low-homology sequences whose average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly, yielding the prediction accuracy often below 60%. Results: We propose a new method to predict protein structural classes on the basis of features extracted from the predicted secondary structures of proteins rather than directly from their amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein sequence. Then, the chaos game representation is employed to represent the predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using recurrence quantification analysis, K-string based information entropy and segment-based analysis. The resulting feature vectors are finally fed into a simple yet powerful Fisher’s discriminant algorithm for the prediction of protein structural classes. We tested the proposed method on three benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%, 83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method consistently performs better for all the tested datasets and the overall accuracy improvements range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/. Conclusion: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the predicted secondary structure sequences, which is capable of characterizing the sequence order information, local interactions of the secondary structural elements, and spacial arrangements of a helices and b strands. Thus, it is a valuable method to predict protein structural classes particularly for low-homology amino acid sequences. Published version 2013-11-27T05:53:26Z 2019-12-06T20:35:20Z 2013-11-27T05:53:26Z 2019-12-06T20:35:20Z 2010 2010 Journal Article Yang, J. Y., Peng, Z. L., & Chen, X. (2010). Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC Bioinformatics, 11(Suppl 1):S9. 1471-2105 https://hdl.handle.net/10356/101219 http://hdl.handle.net/10220/17874 10.1186/1471-2105-11-S1-S9 20122246 en BMC bioinformatics © 2010 Yang et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 10 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Mathematical Sciences
spellingShingle	Mathematical Sciences Yang, Jian-Yi Peng, Zhen-Ling Chen, Xin Prediction of protein structural classes for low-homology sequences based on predicted secondary structure
description	Background: Prediction of protein structural classes (a, b, a + b and a/b) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accuracies can achieve up to 90%. However, for low-homology sequences whose average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly, yielding the prediction accuracy often below 60%. Results: We propose a new method to predict protein structural classes on the basis of features extracted from the predicted secondary structures of proteins rather than directly from their amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein sequence. Then, the chaos game representation is employed to represent the predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using recurrence quantification analysis, K-string based information entropy and segment-based analysis. The resulting feature vectors are finally fed into a simple yet powerful Fisher’s discriminant algorithm for the prediction of protein structural classes. We tested the proposed method on three benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%, 83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method consistently performs better for all the tested datasets and the overall accuracy improvements range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/. Conclusion: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the predicted secondary structure sequences, which is capable of characterizing the sequence order information, local interactions of the secondary structural elements, and spacial arrangements of a helices and b strands. Thus, it is a valuable method to predict protein structural classes particularly for low-homology amino acid sequences.
author2	School of Physical and Mathematical Sciences
author_facet	School of Physical and Mathematical Sciences Yang, Jian-Yi Peng, Zhen-Ling Chen, Xin
format	Article
author	Yang, Jian-Yi Peng, Zhen-Ling Chen, Xin
author_sort	Yang, Jian-Yi
title	Prediction of protein structural classes for low-homology sequences based on predicted secondary structure
title_short	Prediction of protein structural classes for low-homology sequences based on predicted secondary structure
title_full	Prediction of protein structural classes for low-homology sequences based on predicted secondary structure
title_fullStr	Prediction of protein structural classes for low-homology sequences based on predicted secondary structure
title_full_unstemmed	Prediction of protein structural classes for low-homology sequences based on predicted secondary structure
title_sort	prediction of protein structural classes for low-homology sequences based on predicted secondary structure
publishDate	2013
url	https://hdl.handle.net/10356/101219 http://hdl.handle.net/10220/17874
_version_	1759855623318536192

Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

Similar Items