PESI: paratope-epitope set interaction for SARS-CoV-2 neutralization prediction

Prediction of neutralization antibodies is important for the development of effective vaccines and antibody-based therapeutics. Traditional methods rely on features based on first principles derived from the binding interface. However, they are burdened by arduous data preprocessing from a limited q...

Full description

Saved in:
Bibliographic Details
Main Authors: Wan, Zhang, Lin, Zhuoyi, Rashid, Shamima, Ng, Shaun Yue Hao, Yin, Rui, Senthilnath, J., Kwoh, Chee Keong
Other Authors: College of Computing and Data Science
Format: Conference or Workshop Item
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/178506
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Prediction of neutralization antibodies is important for the development of effective vaccines and antibody-based therapeutics. Traditional methods rely on features based on first principles derived from the binding interface. However, they are burdened by arduous data preprocessing from a limited quantity of protein structures. In comparison, deep learning allows automatic substructure characterization and representation without hand-crafted feature engineering. In particular, large language models (LLMs) based method predicts neutralization using Fv sequences of antibody and antigen. Despite LLM's success, incorporating full-length Fv sequences suffers from: 1) inaccurate sequence-level labels in existing datasets, 2) inefficient modeling due to noisy non-contributing motifs, and 3) ignorance of non-bonded interactions that play a key role in facilitating epitope-paratope pairing. In this paper, we propose a novel approach that incorporates only the paratope and epitope for antibody-antigen neutralization prediction while adopting a novel set modeling that regards the paratope and epitope as bags of residues. Specifically, we hand-crafted a dataset containing neutralizing paratope-epitope pairs where epitopes are potentially generalizable to future unseen variants of SARS-CoV-2. Training on such a dataset enables deep learning models to predict neutralizing antibodies for prospective mutated variants of SARS-CoV-2, meanwhile addressing the problem of inaccurate sequence-level labels. A higher modeling efficiency is also achieved by disregarding non-contributing motifs. Furthermore, we also propose paratope-epitope set interaction (PESI), a set modeling model inspired by first principles that learns intra-inter non-covalent interactions through a global attention mechanism. To validate PESI, we perform a 10-fold cross-validation on our dataset. Experimental results show that PESI achieves a more balanced overall performance and a significant improvement on MCC as compared to existing architectures.