Computational analysis of protein tertiary structures

Proteins are essential molecules that play important roles in virtually all the biological functions of a cell, one of which is that of catalysts in chemical reactions. These particular proteins, also known as enzymes, work by lowering the activation energy needed to carry out chemical reactions, t...

Full description

Saved in:
Bibliographic Details
Main Author: Theresia.
Other Authors: Tan Ching Wai
Format: Final Year Project
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10356/17031
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Proteins are essential molecules that play important roles in virtually all the biological functions of a cell, one of which is that of catalysts in chemical reactions. These particular proteins, also known as enzymes, work by lowering the activation energy needed to carry out chemical reactions, thus speeding up the reaction significantly. In chemical reactions, only 1% of residues in the single protein chain contribute to the catalytic reaction. These are known as catalytic residues. Therefore, it is desirable to learn how to identify these residues and their characteristics. The objective of this research is to identify catalytic residues in protein sequences using protein structural information, as previous studies has shown that a more accurate prediction can be yielded with the usage of structural information rather than pure sequence information alone. However, the structural information of protein is less readily available than sequence information. In this project, a novel method to obtain structural information from sequence information was introduced. The Structural Center of Mass (SCOM) and Linear Center of Mass (LCOM) were extracted. SCOM is defined as the centroid of the protein sequence, while LCOM is the midpoint of the protein sequence, in terms of molecular weight. The correlation between both features was analyzed to see if the method introduced was feasible and could be used to predict catalytic residues. In addition, analysis on the correlation between the Conservation Score of a protein and its SCOM was also performed to investigate whether better prediction of catalytic residues can be obtained. The findings show that there was no correlation between LCOM and SCOM. Thus it was not possible to predict the structural information from the sequence information of a protein alone. It is also observed that catalytic residues were not located close to LCOM of the protein, while 70% of catalytic residues were found located in the top 20% residues closest to the SCOM. Furthermore, 76% of the catalytic residues were found as part of the 20% conserved residues closest to the SCOM. Hence, it is concluded that SCOM can be used to identify catalytic residues from a sequence and conservation score should be used together to predict catalytic residues.