A comparative study on gene selection methods for tissues classification on large scale gene expression data

Deoxyribonucleic acid (DNA) microarray technology is the recent invention that provided colossal opportunities to measure a large scale of gene expressions simultaneously.However, interpreting large scale of gene expression data remain a challenging issue due to their innate nature of “high dimensio...

Full description

Saved in:
Bibliographic Details
Main Author: Ahmad, Farzana Kabir
Format: Article
Language:English
Published: Penerbit UTM Press 2016
Subjects:
Online Access:http://repo.uum.edu.my/20187/1/JT%2078%205-10%202016%20116%E2%80%93125.pdf
http://repo.uum.edu.my/20187/
http://doi.org/10.11113/jt.v78.8843
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Utara Malaysia
Language: English
id my.uum.repo.20187
record_format eprints
spelling my.uum.repo.201872016-12-04T08:48:31Z http://repo.uum.edu.my/20187/ A comparative study on gene selection methods for tissues classification on large scale gene expression data Ahmad, Farzana Kabir QA75 Electronic computers. Computer science R Medicine (General) Deoxyribonucleic acid (DNA) microarray technology is the recent invention that provided colossal opportunities to measure a large scale of gene expressions simultaneously.However, interpreting large scale of gene expression data remain a challenging issue due to their innate nature of “high dimensional low sample size”.Microarray data mainly involved thousands of genes, n in a very small size sample, p which complicates the data analysis process.For such a reason, feature selection methods also known as gene selection methods have become apparently need to select significant genes that present the maximum discriminative power between cancerous and normal tissues.Feature selection methods can be structured into three basic factions; a) filter methods; b) wrapper methods and c) embedded methods.Among these methods, filter gene selection methods provide easy way to calculate the informative genes and can simplify reduce the large scale microarray datasets.Although filter based gene selection techniques have been commonly used in analyzing microarray dataset, these techniques have been tested separately in different studies.Therefore, this study aims to investigate and compare the effectiveness of these four popular filter gene selection methods namely Signal-to-Noise ratio (SNR), Fisher Criterion (FC), Information Gain (IG) and t-Test in selecting informative genes that can distinguish cancer and normal tissues.In this experiment, common classifiers, Support Vector Machine (SVM) is used to train the selected genes.These gene selection methods are tested on three large scales of gene expression datasets, namely breast cancer dataset, colon dataset, and lung dataset.This study has discovered that IG and SNR are more suitable to be used with SVM.Furthermore, this study has shown SVM performance remained moderately unaffected unless a very small size of genes was selected. Penerbit UTM Press 2016 Article PeerReviewed application/pdf en http://repo.uum.edu.my/20187/1/JT%2078%205-10%202016%20116%E2%80%93125.pdf Ahmad, Farzana Kabir (2016) A comparative study on gene selection methods for tissues classification on large scale gene expression data. Jurnal Teknologi, 78 (5-10). pp. 116-125. ISSN 0127-9696 http://doi.org/10.11113/jt.v78.8843 doi:10.11113/jt.v78.8843
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutionali Repository
url_provider http://repo.uum.edu.my/
language English
topic QA75 Electronic computers. Computer science
R Medicine (General)
spellingShingle QA75 Electronic computers. Computer science
R Medicine (General)
Ahmad, Farzana Kabir
A comparative study on gene selection methods for tissues classification on large scale gene expression data
description Deoxyribonucleic acid (DNA) microarray technology is the recent invention that provided colossal opportunities to measure a large scale of gene expressions simultaneously.However, interpreting large scale of gene expression data remain a challenging issue due to their innate nature of “high dimensional low sample size”.Microarray data mainly involved thousands of genes, n in a very small size sample, p which complicates the data analysis process.For such a reason, feature selection methods also known as gene selection methods have become apparently need to select significant genes that present the maximum discriminative power between cancerous and normal tissues.Feature selection methods can be structured into three basic factions; a) filter methods; b) wrapper methods and c) embedded methods.Among these methods, filter gene selection methods provide easy way to calculate the informative genes and can simplify reduce the large scale microarray datasets.Although filter based gene selection techniques have been commonly used in analyzing microarray dataset, these techniques have been tested separately in different studies.Therefore, this study aims to investigate and compare the effectiveness of these four popular filter gene selection methods namely Signal-to-Noise ratio (SNR), Fisher Criterion (FC), Information Gain (IG) and t-Test in selecting informative genes that can distinguish cancer and normal tissues.In this experiment, common classifiers, Support Vector Machine (SVM) is used to train the selected genes.These gene selection methods are tested on three large scales of gene expression datasets, namely breast cancer dataset, colon dataset, and lung dataset.This study has discovered that IG and SNR are more suitable to be used with SVM.Furthermore, this study has shown SVM performance remained moderately unaffected unless a very small size of genes was selected.
format Article
author Ahmad, Farzana Kabir
author_facet Ahmad, Farzana Kabir
author_sort Ahmad, Farzana Kabir
title A comparative study on gene selection methods for tissues classification on large scale gene expression data
title_short A comparative study on gene selection methods for tissues classification on large scale gene expression data
title_full A comparative study on gene selection methods for tissues classification on large scale gene expression data
title_fullStr A comparative study on gene selection methods for tissues classification on large scale gene expression data
title_full_unstemmed A comparative study on gene selection methods for tissues classification on large scale gene expression data
title_sort comparative study on gene selection methods for tissues classification on large scale gene expression data
publisher Penerbit UTM Press
publishDate 2016
url http://repo.uum.edu.my/20187/1/JT%2078%205-10%202016%20116%E2%80%93125.pdf
http://repo.uum.edu.my/20187/
http://doi.org/10.11113/jt.v78.8843
_version_ 1644282884872208384