Analyzing RNA-Seq gene expression data using deep learning approaches for cancer classification
Ribonucleic acid Sequencing (RNA-Seq) analysis is particularly useful for obtaining insights into differentially expressed genes. However, it is challenging because of its high-dimensional data. Such analysis is a tool with which to find underlying patterns in data, e.g., for cancer specific biomark...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English English |
Published: |
MDPI AG, Basel, Switzerland
2022
|
Subjects: | |
Online Access: | https://eprints.ums.edu.my/id/eprint/32759/1/Analyzing%20RNA-Seq%20gene%20expression%20data%20using%20deep%20learning%20approaches%20for%20cancer%20classification.ABSTRACT.pdf https://eprints.ums.edu.my/id/eprint/32759/2/Analyzing%20RNA-Seq%20gene%20expression%20data%20using%20deep%20learning%20approaches%20for%20cancer%20classification.pdf https://eprints.ums.edu.my/id/eprint/32759/ https://www.mdpi.com/2076-3417/12/4/1850/htm https://doi.org/10.3390/app12041850 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaysia Sabah |
Language: | English English |
Summary: | Ribonucleic acid Sequencing (RNA-Seq) analysis is particularly useful for obtaining insights into differentially expressed genes. However, it is challenging because of its high-dimensional data. Such analysis is a tool with which to find underlying patterns in data, e.g., for cancer specific biomarkers. In the past, analyses were performed on RNA-Seq data pertaining to the same cancer class as positive and negative samples, i.e., without samples of other cancer types. To perform multiple cancer type classification and to find differentially expressed genes, data for multiple cancer types need to be analyzed. Several repositories offer RNA-Seq data for various cancer types. In this paper, data from the Mendeley data repository for five cancer types are analyzed. As a first step, RNA-Seq values are converted to 2D images using normalization and zero padding. In the next step, relevant features are extracted and selected using Deep Learning (DL). In the last phase, classification is performed, and eight DL algorithms are used. Results and discussion are based on four different splitting strategies and k-fold cross validation for each DL classifier. Furthermore, a comparative analysis is performed with state of the art techniques discussed in literature. The results demonstrated that classifiers performed best at 70–30 split, and that Convolutional Neural Network (CNN) achieved the best overall results. Hence, CNN is the best DL model for classification among the eight studied DL models, and is easy to implement and simple to understand. |
---|