SNARE-CNN : a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data

Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predic...

Full description

Saved in:
Bibliographic Details
Main Authors: Le, Nguyen Quoc Khanh, Nguyen, Van-Nui
Other Authors: School of Humanities
Format: Article
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/144053
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-144053
record_format dspace
spelling sg-ntu-dr.10356-1440532020-10-12T01:13:33Z SNARE-CNN : a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data Le, Nguyen Quoc Khanh Nguyen, Van-Nui School of Humanities Humanities::General Position Specific Scoring Matrix SNARE Protein Function Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predict SNARE proteins, which is one of the most vital molecular functions in life science. A functional loss of SNARE proteins has been implicated in a variety of human diseases (e.g., neurodegenerative, mental illness, cancer, and so on). Therefore, creating a precise model to identify their functions is a crucial problem for understanding these diseases, and designing the drug targets. Our SNARE-CNN model which uses two-dimensional convolutional neural networks and position-specific scoring matrix profiles could identify SNARE proteins with achieved sensitivity of 76.6%, specificity of 93.5%, accuracy of 89.7%, and MCC of 0.7 in cross-validation dataset. We also evaluate the performance of our model via an independent dataset and the result shows that we are able to solve the overfitting problem. Compared with other state-of-the-art methods, this approach achieved significant improvement in all of the metrics. Throughout the proposed study, we provide an effective model for identifying SNARE proteins and a basis for further research that can apply deep learning in bioinformatics, especially in protein function prediction. SNARE-CNN are freely available at https://github.com/khanhlee/snare-cnn. Published version 2020-10-12T01:13:33Z 2020-10-12T01:13:33Z 2019 Journal Article Le, N. Q. K., & Nguyen, V.-N. (2019). SNARE-CNN : a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data. PeerJ Computer Science, 5, e177-. doi:10.7717/peerj-cs.177 2376-5992 https://hdl.handle.net/10356/144053 10.7717/peerj-cs.177 5 en PeerJ Computer Science © 2019 The Author(s) (published by PeerJ). This is an open-access article distributed under the terms of the Creative Commons Attribution License. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Humanities::General
Position Specific Scoring Matrix
SNARE Protein Function
spellingShingle Humanities::General
Position Specific Scoring Matrix
SNARE Protein Function
Le, Nguyen Quoc Khanh
Nguyen, Van-Nui
SNARE-CNN : a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data
description Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predict SNARE proteins, which is one of the most vital molecular functions in life science. A functional loss of SNARE proteins has been implicated in a variety of human diseases (e.g., neurodegenerative, mental illness, cancer, and so on). Therefore, creating a precise model to identify their functions is a crucial problem for understanding these diseases, and designing the drug targets. Our SNARE-CNN model which uses two-dimensional convolutional neural networks and position-specific scoring matrix profiles could identify SNARE proteins with achieved sensitivity of 76.6%, specificity of 93.5%, accuracy of 89.7%, and MCC of 0.7 in cross-validation dataset. We also evaluate the performance of our model via an independent dataset and the result shows that we are able to solve the overfitting problem. Compared with other state-of-the-art methods, this approach achieved significant improvement in all of the metrics. Throughout the proposed study, we provide an effective model for identifying SNARE proteins and a basis for further research that can apply deep learning in bioinformatics, especially in protein function prediction. SNARE-CNN are freely available at https://github.com/khanhlee/snare-cnn.
author2 School of Humanities
author_facet School of Humanities
Le, Nguyen Quoc Khanh
Nguyen, Van-Nui
format Article
author Le, Nguyen Quoc Khanh
Nguyen, Van-Nui
author_sort Le, Nguyen Quoc Khanh
title SNARE-CNN : a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data
title_short SNARE-CNN : a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data
title_full SNARE-CNN : a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data
title_fullStr SNARE-CNN : a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data
title_full_unstemmed SNARE-CNN : a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data
title_sort snare-cnn : a 2d convolutional neural network architecture to identify snare proteins from high-throughput sequencing data
publishDate 2020
url https://hdl.handle.net/10356/144053
_version_ 1681057067773722624