Generalised topological features and machine learning in drug design
One of the key steps of drug design is the prediction of binding affinity between a protein and a ligand. This is a task achievable using methods in supervised learning, where a supervised learning algorithm can be trained on a dataset of protein-ligand pairs and their binding affinity. Previous wor...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/139051 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | One of the key steps of drug design is the prediction of binding affinity between a protein and a ligand. This is a task achievable using methods in supervised learning, where a supervised learning algorithm can be trained on a dataset of protein-ligand pairs and their binding affinity. Previous works have shown that the use of persistent homology to first featurize the protein - ligand data and then machine learning models to generate binding affinity predictions can achieve very high accuracy in the binding affinity task. This work continues this approach to seek models with even better predictive capabilities. Firstly, two modern approaches to persistent homology based featurization are considered, persistent path embedding combined with the signature methods as well as persistent spectral models. These features are then used as inputs in several different machine learning models, including linear models, tree models, deep neural networks and echo state networks. The models are systematically tested on 2 commonly-used databases, the PDBbind – 2007 and PDBbind – 2016 dataset. It is found that the combination of persistent spectral based featurization as well as echo state networks performs the best and out performs several existing models in the literature. |
---|