Topological analysis of protein structures with statistical learning
The study of Protein structure-function relationship has been of key focus in computational biology. A novel method of protein data analysis involves the use of Persistent Homology Analysis (PHA) as a tool for protein classification. The main method of feature engineering from intervals generated is...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/146104 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The study of Protein structure-function relationship has been of key focus in computational biology. A novel method of protein data analysis involves the use of Persistent Homology Analysis (PHA) as a tool for protein classification. The main method of feature engineering from intervals generated is using systematic approach of binning to characterise topological features. These features are then applied into 3 types of statistical learning methods: SVM, Tree-based methods and Neural Networks. Protein classification tasks used include: classification of hemoglobin molecules in relaxed and taut form (task 1) or the identification of all alpha, all beta and alpha-beta protein domains carried out on 450 and 900 proteins samples (task 2 and 3 respectively). The used of modified tree-based approach showed surprisingly stable results that attained the highest overall accuracy of 93.3% and 87.8% for task 2 and 3 respectively. |
---|