Comparative study of feature selection method of microarray data for gene classification

Recent advances in biotechnology such as microarray, offer the ability to measure the levels of expression of thousands of genes in parallel. Analysis of microarray data can provide understanding and insight into gene function and regulatory mechanisms. This analysis is crucial to identify and class...

Full description

Saved in:
Bibliographic Details
Main Author: Ghazali, Nurulhuda
Format: Thesis
Language:English
Published: 2009
Subjects:
Online Access:http://eprints.utm.my/id/eprint/11502/6/NurulhudaGhazaliMFSKSM2009.pdf
http://eprints.utm.my/id/eprint/11502/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
Language: English
Description
Summary:Recent advances in biotechnology such as microarray, offer the ability to measure the levels of expression of thousands of genes in parallel. Analysis of microarray data can provide understanding and insight into gene function and regulatory mechanisms. This analysis is crucial to identify and classify cancer diseases. Recent technology in cancer classification is based on gene expression profile rather than on morphological appearance of the tumor. However, this task is made more difficult due to the noisy nature of microarray data and the overwhelming number of genes. Thus, it is an important issue to select a small subset of genes to represent thousands of genes in microarray data which is referred as informative genes. These informative genes will then be classified according to its appropriate classes. To achieve the best solution to the classification issue, we proposed an approach of minimum Redundancy-Maximum Relevance feature selection method together with Probabilistic Neural Network classifier. The minimum Redundancy- Maximum Relevance feature selection method is used to select the informative genes while the Probabilistic Neural Network classifier acts as the classifier. This approach has been tested on a well-known cancer dataset which is Leukemia. The results achieved shows that the gene selected had given high classification accuracy. This reduction of genes helps take out some burdens from biologist and better classification accuracy can be used widely to detect cancer in early stage.