Topological analysis of protein structures with statistical learning

The study of Protein structure-function relationship has been of key focus in computational biology. A novel method of protein data analysis involves the use of Persistent Homology Analysis (PHA) as a tool for protein classification. The main method of feature engineering from intervals generated is...

Full description

Saved in:
Bibliographic Details
Main Author: Lee, Si Xian
Other Authors: PUN Chi Seng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/146104
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The study of Protein structure-function relationship has been of key focus in computational biology. A novel method of protein data analysis involves the use of Persistent Homology Analysis (PHA) as a tool for protein classification. The main method of feature engineering from intervals generated is using systematic approach of binning to characterise topological features. These features are then applied into 3 types of statistical learning methods: SVM, Tree-based methods and Neural Networks. Protein classification tasks used include: classification of hemoglobin molecules in relaxed and taut form (task 1) or the identification of all alpha, all beta and alpha-beta protein domains carried out on 450 and 900 proteins samples (task 2 and 3 respectively). The used of modified tree-based approach showed surprisingly stable results that attained the highest overall accuracy of 93.3% and 87.8% for task 2 and 3 respectively.