Identification of pathway and gene markers using enhanced directed random walk for multiclass cancer expression data

Cancer markers play a significant role in the diagnosis of the origin of cancers and in the detection of cancers from initial treatments. This is a challenging task owing to the heterogeneity nature of cancers. Identification of these markers could help in improving the survival rate of cancer patie...

Full description

Saved in:
Bibliographic Details
Main Author: Nies, Hui Wen
Format: Thesis
Language:English
Published: 2020
Subjects:
Online Access:http://eprints.utm.my/id/eprint/98108/1/NiesHuiWenPSC2020.pdf
http://eprints.utm.my/id/eprint/98108/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:143755
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
Language: English
id my.utm.98108
record_format eprints
spelling my.utm.981082022-11-14T10:07:17Z http://eprints.utm.my/id/eprint/98108/ Identification of pathway and gene markers using enhanced directed random walk for multiclass cancer expression data Nies, Hui Wen QA75 Electronic computers. Computer science Cancer markers play a significant role in the diagnosis of the origin of cancers and in the detection of cancers from initial treatments. This is a challenging task owing to the heterogeneity nature of cancers. Identification of these markers could help in improving the survival rate of cancer patients, in which dedicated treatment can be provided according to the diagnosis or even prevention. Previous investigations show that the use of pathway topology information could help in the detection of cancer markers from gene expression. Such analysis reduces its complexity from thousands of genes to a few hundreds of pathways. However, most of the existing methods group different cancer subtypes into just disease samples, and consider all pathways contribute equally in the analysis process. Meanwhile, the interaction between multiple genes and the genes with missing edges has been ignored in several other methods, and hence could lead to the poor performance of the identification of cancer markers from gene expression. Thus, this research proposes enhanced directed random walk to identify pathway and gene markers for multiclass cancer gene expression data. Firstly, an improved pathway selection with analysis of variances (ANOVA) that enables the consideration of multiple cancer subtypes is performed, and subsequently the integration of k-mean clustering and average silhouette method in the directed random walk that considers the interaction of multiple genes is also conducted. The proposed methods are tested on benchmark gene expression datasets (breast, lung, and skin cancers) and biological pathways. The performance of the proposed methods is then measured and compared in terms of classification accuracy and area under the receiver operating characteristics curve (AUC). The results indicate that the proposed methods are able to identify a list of pathway and gene markers from the datasets with better classification accuracy and AUC. The proposed methods have improved the classification performance in the range of between 1% and 35% compared with existing methods. Cell cycle and p53 signaling pathway were found significantly associated with breast, lung, and skin cancers, while the cell cycle was highly enriched with squamous cell carcinoma and adenocarcinoma. 2020 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/98108/1/NiesHuiWenPSC2020.pdf Nies, Hui Wen (2020) Identification of pathway and gene markers using enhanced directed random walk for multiclass cancer expression data. PhD thesis, Universiti Teknologi Malaysia, Faculty of Engineering - School of Computing. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:143755
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Nies, Hui Wen
Identification of pathway and gene markers using enhanced directed random walk for multiclass cancer expression data
description Cancer markers play a significant role in the diagnosis of the origin of cancers and in the detection of cancers from initial treatments. This is a challenging task owing to the heterogeneity nature of cancers. Identification of these markers could help in improving the survival rate of cancer patients, in which dedicated treatment can be provided according to the diagnosis or even prevention. Previous investigations show that the use of pathway topology information could help in the detection of cancer markers from gene expression. Such analysis reduces its complexity from thousands of genes to a few hundreds of pathways. However, most of the existing methods group different cancer subtypes into just disease samples, and consider all pathways contribute equally in the analysis process. Meanwhile, the interaction between multiple genes and the genes with missing edges has been ignored in several other methods, and hence could lead to the poor performance of the identification of cancer markers from gene expression. Thus, this research proposes enhanced directed random walk to identify pathway and gene markers for multiclass cancer gene expression data. Firstly, an improved pathway selection with analysis of variances (ANOVA) that enables the consideration of multiple cancer subtypes is performed, and subsequently the integration of k-mean clustering and average silhouette method in the directed random walk that considers the interaction of multiple genes is also conducted. The proposed methods are tested on benchmark gene expression datasets (breast, lung, and skin cancers) and biological pathways. The performance of the proposed methods is then measured and compared in terms of classification accuracy and area under the receiver operating characteristics curve (AUC). The results indicate that the proposed methods are able to identify a list of pathway and gene markers from the datasets with better classification accuracy and AUC. The proposed methods have improved the classification performance in the range of between 1% and 35% compared with existing methods. Cell cycle and p53 signaling pathway were found significantly associated with breast, lung, and skin cancers, while the cell cycle was highly enriched with squamous cell carcinoma and adenocarcinoma.
format Thesis
author Nies, Hui Wen
author_facet Nies, Hui Wen
author_sort Nies, Hui Wen
title Identification of pathway and gene markers using enhanced directed random walk for multiclass cancer expression data
title_short Identification of pathway and gene markers using enhanced directed random walk for multiclass cancer expression data
title_full Identification of pathway and gene markers using enhanced directed random walk for multiclass cancer expression data
title_fullStr Identification of pathway and gene markers using enhanced directed random walk for multiclass cancer expression data
title_full_unstemmed Identification of pathway and gene markers using enhanced directed random walk for multiclass cancer expression data
title_sort identification of pathway and gene markers using enhanced directed random walk for multiclass cancer expression data
publishDate 2020
url http://eprints.utm.my/id/eprint/98108/1/NiesHuiWenPSC2020.pdf
http://eprints.utm.my/id/eprint/98108/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:143755
_version_ 1751536148741619712