Enzyme catalytic residue prediction using deep learning methods

Identification of catalytic residues in enzymes have important applications ranging from drug discovery to protein engineering. However, locating catalytic residues in laboratory is time consuming and costly. Through high throughput computational methods, potential catalytic residues could be elucid...

全面介紹

Saved in:

書目詳細資料
主要作者:	Guan, Jia Sheng
其他作者:	Mu Yuguang
格式:	Final Year Project
語言:	English
出版:	Nanyang Technological University 2023
主題:	Science::Biological sciences
在線閱讀:	https://hdl.handle.net/10356/171862
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

id	sg-ntu-dr.10356-171862
record_format	dspace
spelling	sg-ntu-dr.10356-1718622023-11-20T15:32:40Z Enzyme catalytic residue prediction using deep learning methods Guan, Jia Sheng Mu Yuguang School of Biological Sciences YGMu@ntu.edu.sg Science::Biological sciences Identification of catalytic residues in enzymes have important applications ranging from drug discovery to protein engineering. However, locating catalytic residues in laboratory is time consuming and costly. Through high throughput computational methods, potential catalytic residues could be elucidated. While many models trained to predict catalytic residues were published, there are still unexplored combinations of model features and data preparation methods. In this project, graph neural network (GNN) and multi-layer perceptron (MLP) models were constructed to predict catalytic residues. The choice of edge weight equation was discovered to have huge impact on GNN model performance. Embeddings from a large protein language model, Evolutionary Scale Modeling 2 (ESM-2), were experimented and found suitable as features for MLP and GNN models, rivaling many published models in performance. Atchley factors as features were investigated but results hinted that the information might have already been included in the ESM-2 embeddings. To address knowledge gap, structural information of entire protein complex was considered as GNN model feature but found no benefits as compared to using only monomer structures as in published models. To resolve class imbalance issue, down-sampling of non-catalytic to catalytic residues to a 10:1 ratio was tested but it did not improve models’ performances. Bachelor of Science in Biological Sciences 2023-11-14T06:42:31Z 2023-11-14T06:42:31Z 2023 Final Year Project (FYP) Guan, J. S. (2023). Enzyme catalytic residue prediction using deep learning methods. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/171862 https://hdl.handle.net/10356/171862 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Science::Biological sciences
spellingShingle	Science::Biological sciences Guan, Jia Sheng Enzyme catalytic residue prediction using deep learning methods
description	Identification of catalytic residues in enzymes have important applications ranging from drug discovery to protein engineering. However, locating catalytic residues in laboratory is time consuming and costly. Through high throughput computational methods, potential catalytic residues could be elucidated. While many models trained to predict catalytic residues were published, there are still unexplored combinations of model features and data preparation methods. In this project, graph neural network (GNN) and multi-layer perceptron (MLP) models were constructed to predict catalytic residues. The choice of edge weight equation was discovered to have huge impact on GNN model performance. Embeddings from a large protein language model, Evolutionary Scale Modeling 2 (ESM-2), were experimented and found suitable as features for MLP and GNN models, rivaling many published models in performance. Atchley factors as features were investigated but results hinted that the information might have already been included in the ESM-2 embeddings. To address knowledge gap, structural information of entire protein complex was considered as GNN model feature but found no benefits as compared to using only monomer structures as in published models. To resolve class imbalance issue, down-sampling of non-catalytic to catalytic residues to a 10:1 ratio was tested but it did not improve models’ performances.
author2	Mu Yuguang
author_facet	Mu Yuguang Guan, Jia Sheng
format	Final Year Project
author	Guan, Jia Sheng
author_sort	Guan, Jia Sheng
title	Enzyme catalytic residue prediction using deep learning methods
title_short	Enzyme catalytic residue prediction using deep learning methods
title_full	Enzyme catalytic residue prediction using deep learning methods
title_fullStr	Enzyme catalytic residue prediction using deep learning methods
title_full_unstemmed	Enzyme catalytic residue prediction using deep learning methods
title_sort	enzyme catalytic residue prediction using deep learning methods
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/171862
_version_	1783955521219330048

Enzyme catalytic residue prediction using deep learning methods

相似書籍