SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins

Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain l...

Full description

Saved in:
Bibliographic Details
Main Author: Ahmad S.
Other Authors: Mahidol University
Format: Article
Published: 2023
Subjects:
Online Access:https://repository.li.mahidol.ac.th/handle/123456789/86440
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Mahidol University
id th-mahidol.86440
record_format dspace
spelling th-mahidol.864402023-06-19T01:04:45Z SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins Ahmad S. Mahidol University Multidisciplinary Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository (https://github.com/saeed344/SCORPION). 2023-06-18T18:04:45Z 2023-06-18T18:04:45Z 2022-12-01 Article Scientific Reports Vol.12 No.1 (2022) 10.1038/s41598-022-08173-5 20452322 35260777 2-s2.0-85126076923 https://repository.li.mahidol.ac.th/handle/123456789/86440 SCOPUS
institution Mahidol University
building Mahidol University Library
continent Asia
country Thailand
Thailand
content_provider Mahidol University Library
collection Mahidol University Institutional Repository
topic Multidisciplinary
spellingShingle Multidisciplinary
Ahmad S.
SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
description Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository (https://github.com/saeed344/SCORPION).
author2 Mahidol University
author_facet Mahidol University
Ahmad S.
format Article
author Ahmad S.
author_sort Ahmad S.
title SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
title_short SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
title_full SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
title_fullStr SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
title_full_unstemmed SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
title_sort scorpion is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
publishDate 2023
url https://repository.li.mahidol.ac.th/handle/123456789/86440
_version_ 1781415326451236864