Analysis and prediction of pair-wise contacts in protein tertiary structures

Protein structure prediction has been one of the greatest challenges in the field of computational biology and chemistry. This report presents findings on the statistical analysis of secondary structure-related patterns exhibited by non-local contact-pairs, followed by the investigation of a simple...

Full description

Saved in:
Bibliographic Details
Main Author: Yan, Eugene Wenhui
Other Authors: Tan Ching Wai
Format: Final Year Project
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10356/16981
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Protein structure prediction has been one of the greatest challenges in the field of computational biology and chemistry. This report presents findings on the statistical analysis of secondary structure-related patterns exhibited by non-local contact-pairs, followed by the investigation of a simple predictive model using correlated mutational behavior arising from multiple sequence alignment of homologues, whilst using PSSM as a scoring function. The initial Exploratory Data Analysis phase produced results that show some unique patterns in formation of contact-pairs observed from proteins of SCOP classes A, B, C and D. By studying the characteristics of contact-pairs in known PDB structures, mathematical functions could be devised to serve as general estimators for contact-pair occurrences within a given protein sequence. Measurement of the frequency of residue-pairings also shed light on possibility of assigning probabilities to prediction models for showing preference to energetically favorable pairings that should have higher likelihood of forming contacts. The implemented prediction model yielded results that show a very slight improvement of between 2-14 percent over random assignment. The model was evaluated to be naïve, due to the absence of weighted parameters that could possibly filter the signals of true contacts from the background noise in graphical plots. The model also highlighted the common problem faced by most prediction techniques in comparative modeling, which is the huge number of false positives that hamper accuracy. Nevertheless, it has shown that PSSM is a viable late-stage scoring mechanism for the computation of correlation coefficient values, and is worthy of further research in the future.