Analyzing RNA-Seq gene expression data using deep learning approaches for cancer classification

Ribonucleic acid Sequencing (RNA-Seq) analysis is particularly useful for obtaining insights into differentially expressed genes. However, it is challenging because of its high-dimensional data. Such analysis is a tool with which to find underlying patterns in data, e.g., for cancer specific biomark...

Full description

Saved in:
Bibliographic Details
Main Authors: Laiqa Rukhsar, Waqas Haider Bangyal, Muhammad Sadiq Ali Khan, Ag Asri Ag Ibrahim, Kashif Nisar, Danda B. Rawat
Format: Article
Language:English
English
Published: MDPI AG, Basel, Switzerland 2022
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/32759/1/Analyzing%20RNA-Seq%20gene%20expression%20data%20using%20deep%20learning%20approaches%20for%20cancer%20classification.ABSTRACT.pdf
https://eprints.ums.edu.my/id/eprint/32759/2/Analyzing%20RNA-Seq%20gene%20expression%20data%20using%20deep%20learning%20approaches%20for%20cancer%20classification.pdf
https://eprints.ums.edu.my/id/eprint/32759/
https://www.mdpi.com/2076-3417/12/4/1850/htm
https://doi.org/10.3390/app12041850
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Sabah
Language: English
English
id my.ums.eprints.32759
record_format eprints
spelling my.ums.eprints.327592022-06-09T04:14:56Z https://eprints.ums.edu.my/id/eprint/32759/ Analyzing RNA-Seq gene expression data using deep learning approaches for cancer classification Laiqa Rukhsar Waqas Haider Bangyal Muhammad Sadiq Ali Khan Ag Asri Ag Ibrahim Kashif Nisar Danda B. Rawat QP1-(981) Physiology RC254-282 Neoplasms. Tumors. Oncology Including cancer and carcinogens Ribonucleic acid Sequencing (RNA-Seq) analysis is particularly useful for obtaining insights into differentially expressed genes. However, it is challenging because of its high-dimensional data. Such analysis is a tool with which to find underlying patterns in data, e.g., for cancer specific biomarkers. In the past, analyses were performed on RNA-Seq data pertaining to the same cancer class as positive and negative samples, i.e., without samples of other cancer types. To perform multiple cancer type classification and to find differentially expressed genes, data for multiple cancer types need to be analyzed. Several repositories offer RNA-Seq data for various cancer types. In this paper, data from the Mendeley data repository for five cancer types are analyzed. As a first step, RNA-Seq values are converted to 2D images using normalization and zero padding. In the next step, relevant features are extracted and selected using Deep Learning (DL). In the last phase, classification is performed, and eight DL algorithms are used. Results and discussion are based on four different splitting strategies and k-fold cross validation for each DL classifier. Furthermore, a comparative analysis is performed with state of the art techniques discussed in literature. The results demonstrated that classifiers performed best at 70–30 split, and that Convolutional Neural Network (CNN) achieved the best overall results. Hence, CNN is the best DL model for classification among the eight studied DL models, and is easy to implement and simple to understand. MDPI AG, Basel, Switzerland 2022-02-11 Article PeerReviewed text en https://eprints.ums.edu.my/id/eprint/32759/1/Analyzing%20RNA-Seq%20gene%20expression%20data%20using%20deep%20learning%20approaches%20for%20cancer%20classification.ABSTRACT.pdf text en https://eprints.ums.edu.my/id/eprint/32759/2/Analyzing%20RNA-Seq%20gene%20expression%20data%20using%20deep%20learning%20approaches%20for%20cancer%20classification.pdf Laiqa Rukhsar and Waqas Haider Bangyal and Muhammad Sadiq Ali Khan and Ag Asri Ag Ibrahim and Kashif Nisar and Danda B. Rawat (2022) Analyzing RNA-Seq gene expression data using deep learning approaches for cancer classification. Applied Sciences, 12 (4). pp. 1-17. ISSN 2076-3417 https://www.mdpi.com/2076-3417/12/4/1850/htm https://doi.org/10.3390/app12041850
institution Universiti Malaysia Sabah
building UMS Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sabah
content_source UMS Institutional Repository
url_provider http://eprints.ums.edu.my/
language English
English
topic QP1-(981) Physiology
RC254-282 Neoplasms. Tumors. Oncology Including cancer and carcinogens
spellingShingle QP1-(981) Physiology
RC254-282 Neoplasms. Tumors. Oncology Including cancer and carcinogens
Laiqa Rukhsar
Waqas Haider Bangyal
Muhammad Sadiq Ali Khan
Ag Asri Ag Ibrahim
Kashif Nisar
Danda B. Rawat
Analyzing RNA-Seq gene expression data using deep learning approaches for cancer classification
description Ribonucleic acid Sequencing (RNA-Seq) analysis is particularly useful for obtaining insights into differentially expressed genes. However, it is challenging because of its high-dimensional data. Such analysis is a tool with which to find underlying patterns in data, e.g., for cancer specific biomarkers. In the past, analyses were performed on RNA-Seq data pertaining to the same cancer class as positive and negative samples, i.e., without samples of other cancer types. To perform multiple cancer type classification and to find differentially expressed genes, data for multiple cancer types need to be analyzed. Several repositories offer RNA-Seq data for various cancer types. In this paper, data from the Mendeley data repository for five cancer types are analyzed. As a first step, RNA-Seq values are converted to 2D images using normalization and zero padding. In the next step, relevant features are extracted and selected using Deep Learning (DL). In the last phase, classification is performed, and eight DL algorithms are used. Results and discussion are based on four different splitting strategies and k-fold cross validation for each DL classifier. Furthermore, a comparative analysis is performed with state of the art techniques discussed in literature. The results demonstrated that classifiers performed best at 70–30 split, and that Convolutional Neural Network (CNN) achieved the best overall results. Hence, CNN is the best DL model for classification among the eight studied DL models, and is easy to implement and simple to understand.
format Article
author Laiqa Rukhsar
Waqas Haider Bangyal
Muhammad Sadiq Ali Khan
Ag Asri Ag Ibrahim
Kashif Nisar
Danda B. Rawat
author_facet Laiqa Rukhsar
Waqas Haider Bangyal
Muhammad Sadiq Ali Khan
Ag Asri Ag Ibrahim
Kashif Nisar
Danda B. Rawat
author_sort Laiqa Rukhsar
title Analyzing RNA-Seq gene expression data using deep learning approaches for cancer classification
title_short Analyzing RNA-Seq gene expression data using deep learning approaches for cancer classification
title_full Analyzing RNA-Seq gene expression data using deep learning approaches for cancer classification
title_fullStr Analyzing RNA-Seq gene expression data using deep learning approaches for cancer classification
title_full_unstemmed Analyzing RNA-Seq gene expression data using deep learning approaches for cancer classification
title_sort analyzing rna-seq gene expression data using deep learning approaches for cancer classification
publisher MDPI AG, Basel, Switzerland
publishDate 2022
url https://eprints.ums.edu.my/id/eprint/32759/1/Analyzing%20RNA-Seq%20gene%20expression%20data%20using%20deep%20learning%20approaches%20for%20cancer%20classification.ABSTRACT.pdf
https://eprints.ums.edu.my/id/eprint/32759/2/Analyzing%20RNA-Seq%20gene%20expression%20data%20using%20deep%20learning%20approaches%20for%20cancer%20classification.pdf
https://eprints.ums.edu.my/id/eprint/32759/
https://www.mdpi.com/2076-3417/12/4/1850/htm
https://doi.org/10.3390/app12041850
_version_ 1760231070261837824