Effective gene selection techniques for classification of gene expression data
Recent introduction of microarray technology allows researchers to monitor thousands of gene expression levels in a microarray experiment. Classification of tissue samples into tumor or normal is one of the applications of microarray technology. When classifying tissue samples, gene selection plays...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2005
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/4222/1/YeoLeeChinMFSKSM2005.pdf http://eprints.utm.my/id/eprint/4222/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Malaysia |
Language: | English |
id |
my.utm.4222 |
---|---|
record_format |
eprints |
spelling |
my.utm.42222018-01-16T07:12:35Z http://eprints.utm.my/id/eprint/4222/ Effective gene selection techniques for classification of gene expression data Yeo, Lee Chin QA75 Electronic computers. Computer science Recent introduction of microarray technology allows researchers to monitor thousands of gene expression levels in a microarray experiment. Classification of tissue samples into tumor or normal is one of the applications of microarray technology. When classifying tissue samples, gene selection plays an important role. In this research, some existing gene selection techniques are studied and better gene selection techniques are proposed and developed. The proposed approach is carried out by first grouping genes with similar expression profiles into distinct clusters, calculating the cluster quality, calculating the discriminative score for each gene by using statistical techniques, and then selecting informative genes from these clusters based on the cluster quality and discriminative score. The selected subset of genes is then be used to train the classifiers for constructing rules for future tissue classification problem. Various k-means clustering algorithms and model-based clustering algorithms are proposed to group the genes. The statistical techniques used are Fisher Criterion, Golub Signal-to-Noise, Mann-Whitney Rank Sum Statistic and traditional t-test. Support Vector Machine (SVM) and k-nearest neighbour (knn) are used for the classification purposes. The proposed approach is validated using leave one out cross validation (LOOCV). Receiver operating characteristic (ROC) score is used to analyze the results. Colon data with 2000 genes and 62 tissue samples is used for the testing. Highest ROC score recorded from the experiments achieved 0.95, corresponding to five misclassifications. This should be of significant value for diagnostic purposes as well as for guiding further exploration of the underlying biology. 2005-04 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/4222/1/YeoLeeChinMFSKSM2005.pdf Yeo, Lee Chin (2005) Effective gene selection techniques for classification of gene expression data. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information System. |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Yeo, Lee Chin Effective gene selection techniques for classification of gene expression data |
description |
Recent introduction of microarray technology allows researchers to monitor thousands of gene expression levels in a microarray experiment. Classification of tissue samples into tumor or normal is one of the applications of microarray technology. When classifying tissue samples, gene selection plays an important role. In this research, some existing gene selection techniques are studied and better gene selection techniques are proposed and developed. The proposed approach is carried out by first grouping genes with similar expression profiles into distinct clusters, calculating the cluster quality, calculating the discriminative score for each gene by using statistical techniques, and then selecting informative genes from these clusters based on the cluster quality and discriminative score. The selected subset of genes is then be used to train the classifiers for constructing rules for future tissue classification problem. Various k-means clustering algorithms and model-based clustering algorithms are proposed to group the genes. The statistical techniques used are Fisher Criterion, Golub Signal-to-Noise, Mann-Whitney Rank Sum Statistic and traditional t-test. Support Vector Machine (SVM) and k-nearest neighbour (knn) are used for the classification purposes. The proposed approach is validated using leave one out cross validation (LOOCV). Receiver operating characteristic (ROC) score is used to analyze the results. Colon data with 2000 genes and 62 tissue samples is used for the testing. Highest ROC score recorded from the experiments achieved 0.95, corresponding to five misclassifications. This should be of significant value for diagnostic purposes as well as for guiding further exploration of the underlying biology. |
format |
Thesis |
author |
Yeo, Lee Chin |
author_facet |
Yeo, Lee Chin |
author_sort |
Yeo, Lee Chin |
title |
Effective gene selection techniques for classification of gene expression data |
title_short |
Effective gene selection techniques for classification of gene expression data |
title_full |
Effective gene selection techniques for classification of gene expression data |
title_fullStr |
Effective gene selection techniques for classification of gene expression data |
title_full_unstemmed |
Effective gene selection techniques for classification of gene expression data |
title_sort |
effective gene selection techniques for classification of gene expression data |
publishDate |
2005 |
url |
http://eprints.utm.my/id/eprint/4222/1/YeoLeeChinMFSKSM2005.pdf http://eprints.utm.my/id/eprint/4222/ |
_version_ |
1643643999614926848 |