SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs

Existing methods for predicting protein crystallization obtain high accuracy using various types of complemented features and complex ensemble classifiers, such as support vector machine (SVM) and Random Forest classifiers. It is desirable to develop a simple and easily interpretable prediction meth...

Full description

Saved in:
Bibliographic Details
Main Authors: Phasit Charoenkwan, Watshara Shoombuatong, Hua Chin Lee, Jeerayut Chaijaruwanich, Hui Ling Huang, Shinn Ying Ho
Format: Journal
Published: 2018
Online Access:https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84883364817&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/47654
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
id th-cmuir.6653943832-47654
record_format dspace
spelling th-cmuir.6653943832-476542018-04-25T08:42:28Z SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs Phasit Charoenkwan Watshara Shoombuatong Hua Chin Lee Jeerayut Chaijaruwanich Hui Ling Huang Shinn Ying Ho Existing methods for predicting protein crystallization obtain high accuracy using various types of complemented features and complex ensemble classifiers, such as support vector machine (SVM) and Random Forest classifiers. It is desirable to develop a simple and easily interpretable prediction method with informative sequence features to provide insights into protein crystallization. This study proposes an ensemble method, SCMCRYS, to predict protein crystallization, for which each classifier is built by using a scoring card method (SCM) with estimating propensity scores of p-collocated amino acid (AA) pairs (p = 0 for a dipeptide). The SCM classifier determines the crystallization of a sequence according to a weighted-sum score. The weights are the composition of the p-collocated AA pairs, and the propensity scores of these AA pairs are estimated using a statistic with optimization approach. SCMCRYS predicts the crystallization using a simple voting method from a number of SCM classifiers. The experimental results show that the single SCM classifier utilizing dipeptide composition with accuracy of 73.90% is comparable to the best previously-developed SVM-based classifier, SVM_POLY (74.6%), and our proposed SVM-based classifier utilizing the same dipeptide composition (77.55%). The SCMCRYS method with accuracy of 76.1% is comparable to the state-of-the-art ensemble methods PPCpred (76.8%) and RFCRYS (80.0%), which used the SVM and Random Forest classifiers, respectively. This study also investigates mutagenesis analysis based on SCM and the result reveals the hypothesis that the mutagenesis of surface residues Ala and Cys has large and small probabilities of enhancing protein crystallizability considering the estimated scores of crystallizability and solubility, melting point, molecular weight and conformational entropy of amino acids in a generalized condition. The propensity scores of amino acids and dipeptides for estimating the protein crystallizability can aid biologists in designing mutation of surface residues to enhance protein crystallizability. The source code of SCMCRYS is available at http://iclab.life.nctu.edu.tw/SCMCRYS/. © 2013 Charoenkwan et al. 2018-04-25T08:42:28Z 2018-04-25T08:42:28Z 2013-09-03 Journal 19326203 2-s2.0-84883364817 10.1371/journal.pone.0072368 https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84883364817&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/47654
institution Chiang Mai University
building Chiang Mai University Library
country Thailand
collection CMU Intellectual Repository
description Existing methods for predicting protein crystallization obtain high accuracy using various types of complemented features and complex ensemble classifiers, such as support vector machine (SVM) and Random Forest classifiers. It is desirable to develop a simple and easily interpretable prediction method with informative sequence features to provide insights into protein crystallization. This study proposes an ensemble method, SCMCRYS, to predict protein crystallization, for which each classifier is built by using a scoring card method (SCM) with estimating propensity scores of p-collocated amino acid (AA) pairs (p = 0 for a dipeptide). The SCM classifier determines the crystallization of a sequence according to a weighted-sum score. The weights are the composition of the p-collocated AA pairs, and the propensity scores of these AA pairs are estimated using a statistic with optimization approach. SCMCRYS predicts the crystallization using a simple voting method from a number of SCM classifiers. The experimental results show that the single SCM classifier utilizing dipeptide composition with accuracy of 73.90% is comparable to the best previously-developed SVM-based classifier, SVM_POLY (74.6%), and our proposed SVM-based classifier utilizing the same dipeptide composition (77.55%). The SCMCRYS method with accuracy of 76.1% is comparable to the state-of-the-art ensemble methods PPCpred (76.8%) and RFCRYS (80.0%), which used the SVM and Random Forest classifiers, respectively. This study also investigates mutagenesis analysis based on SCM and the result reveals the hypothesis that the mutagenesis of surface residues Ala and Cys has large and small probabilities of enhancing protein crystallizability considering the estimated scores of crystallizability and solubility, melting point, molecular weight and conformational entropy of amino acids in a generalized condition. The propensity scores of amino acids and dipeptides for estimating the protein crystallizability can aid biologists in designing mutation of surface residues to enhance protein crystallizability. The source code of SCMCRYS is available at http://iclab.life.nctu.edu.tw/SCMCRYS/. © 2013 Charoenkwan et al.
format Journal
author Phasit Charoenkwan
Watshara Shoombuatong
Hua Chin Lee
Jeerayut Chaijaruwanich
Hui Ling Huang
Shinn Ying Ho
spellingShingle Phasit Charoenkwan
Watshara Shoombuatong
Hua Chin Lee
Jeerayut Chaijaruwanich
Hui Ling Huang
Shinn Ying Ho
SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs
author_facet Phasit Charoenkwan
Watshara Shoombuatong
Hua Chin Lee
Jeerayut Chaijaruwanich
Hui Ling Huang
Shinn Ying Ho
author_sort Phasit Charoenkwan
title SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs
title_short SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs
title_full SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs
title_fullStr SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs
title_full_unstemmed SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs
title_sort scmcrys: predicting protein crystallization using an ensemble scoring card method with estimating propensity scores of p-collocated amino acid pairs
publishDate 2018
url https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84883364817&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/47654
_version_ 1681423101921853440