GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features

The cell surface proteins of gram-positive bacteria are involved in many important biological functions, including the infection of host cells. Owing to their virulent nature, these proteins are also considered strong candidates for potential drug or vaccine targets. Among the various cell surface p...

Full description

Saved in:
Bibliographic Details
Main Author: Malik A.
Other Authors: Mahidol University
Format: Article
Published: 2023
Subjects:
Online Access:https://repository.li.mahidol.ac.th/handle/123456789/81652
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Mahidol University
id th-mahidol.81652
record_format dspace
spelling th-mahidol.816522023-05-19T14:35:39Z GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features Malik A. Mahidol University Biochemistry, Genetics and Molecular Biology The cell surface proteins of gram-positive bacteria are involved in many important biological functions, including the infection of host cells. Owing to their virulent nature, these proteins are also considered strong candidates for potential drug or vaccine targets. Among the various cell surface proteins of gram-positive bacteria, LPXTG-like proteins form a major class. These proteins have a highly conserved C-terminal cell wall sorting signal, which consists of an LPXTG sequence motif, a hydrophobic domain, and a positively charged tail. These surface proteins are targeted to the cell envelope by a sortase enzyme via transpeptidation. A variety of LPXTG-like proteins have been experimentally characterized; however, their number in public databases has increased owing to extensive bacterial genome sequencing without proper annotation. In the absence of experimental characterization, identifying and annotating these sequences is extremely challenging. Therefore, in this study, we developed the first machine learning-based predictor called GPApred, which can identify LPXTG-like proteins from their primary sequences. Using a newly constructed benchmark dataset, we explored different classifiers and five feature encodings and their hybrids. Optimal features were derived using the recursive feature elimination method, and these features were then trained using a support vector machine algorithm. The performance of different models was evaluated using independent datasets, and a final model (GPApred) was selected based on consistency during cross-validation and independent assessment. GPApred can be an effective tool for predicting LPXTG-like sequences and can be further employed for functional characterization or drug targeting. Availability: https://procarb.org/gpapred/. 2023-05-19T07:35:39Z 2023-05-19T07:35:39Z 2023-02-28 Article International Journal of Biological Macromolecules Vol.229 (2023) , 529-538 10.1016/j.ijbiomac.2022.12.315 18790003 01418130 36596370 2-s2.0-85145730211 https://repository.li.mahidol.ac.th/handle/123456789/81652 SCOPUS
institution Mahidol University
building Mahidol University Library
continent Asia
country Thailand
Thailand
content_provider Mahidol University Library
collection Mahidol University Institutional Repository
topic Biochemistry, Genetics and Molecular Biology
spellingShingle Biochemistry, Genetics and Molecular Biology
Malik A.
GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features
description The cell surface proteins of gram-positive bacteria are involved in many important biological functions, including the infection of host cells. Owing to their virulent nature, these proteins are also considered strong candidates for potential drug or vaccine targets. Among the various cell surface proteins of gram-positive bacteria, LPXTG-like proteins form a major class. These proteins have a highly conserved C-terminal cell wall sorting signal, which consists of an LPXTG sequence motif, a hydrophobic domain, and a positively charged tail. These surface proteins are targeted to the cell envelope by a sortase enzyme via transpeptidation. A variety of LPXTG-like proteins have been experimentally characterized; however, their number in public databases has increased owing to extensive bacterial genome sequencing without proper annotation. In the absence of experimental characterization, identifying and annotating these sequences is extremely challenging. Therefore, in this study, we developed the first machine learning-based predictor called GPApred, which can identify LPXTG-like proteins from their primary sequences. Using a newly constructed benchmark dataset, we explored different classifiers and five feature encodings and their hybrids. Optimal features were derived using the recursive feature elimination method, and these features were then trained using a support vector machine algorithm. The performance of different models was evaluated using independent datasets, and a final model (GPApred) was selected based on consistency during cross-validation and independent assessment. GPApred can be an effective tool for predicting LPXTG-like sequences and can be further employed for functional characterization or drug targeting. Availability: https://procarb.org/gpapred/.
author2 Mahidol University
author_facet Mahidol University
Malik A.
format Article
author Malik A.
author_sort Malik A.
title GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features
title_short GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features
title_full GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features
title_fullStr GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features
title_full_unstemmed GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features
title_sort gpapred: the first computational predictor for identifying proteins with lpxtg-like motif using sequence-based optimal features
publishDate 2023
url https://repository.li.mahidol.ac.th/handle/123456789/81652
_version_ 1781416761877331968