Geometric deep learning for antibiotic discovery

Nowadays, in order to reduce the unbearable laboratory cost, time cost and increase the accuracy rate of new drug identification at the same time, Artificial Intelligence (AI) techniques have been widely applied in pharmaceutical industry for drug discovery programs. In this article, we propos...

Full description

Saved in:
Bibliographic Details
Main Author: Choo, Hou Yee
Other Authors: Xia Kelin
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156881
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Nowadays, in order to reduce the unbearable laboratory cost, time cost and increase the accuracy rate of new drug identification at the same time, Artificial Intelligence (AI) techniques have been widely applied in pharmaceutical industry for drug discovery programs. In this article, we proposed a geometric deep learning model that utilized the Graph Attention Network (GAT) to identify potential new antibiotics candidates. Then, several performance metrics were tested on to evaluate the model, which included AUC-ROC, accuracy, and weighted average of precision, recall and F1-score. The performance of the proposed model was then compared with other existing geometric deep learning models. Undersampling and 5-fold cross validation were applied to reduce imbalance of data and reduce the variance and bias of the resulting performance metrics, respectively, to make our experiment fair. The result of the experiment showed that the proposed model outperformed all other competing models in all performance metrics. This probably implies that the proposed model, which leverages more on the neighboring messages that are more relevant to the updating atoms, are more suitable for molecular property identification. Also, an ablation study was conducted to investigate the contribution of Morgan Fingerprint, molecular graph embeddings, and SMILES text embeddings towards the molecular property of interest. It turned out that Morgan Fingerprint and molecular graph embeddings are the optimal combination of embeddings to be included in our model.