Feature selection for micro-array data classification

Thousands of genes can be identified by DNA microarray technology at the same time which can have a very large application in biological processes and biomedical study. The knowledge of the micro-array data analysis is gained increasingly, and it is very important and useful for phenotype classifica...

Full description

Saved in:
Bibliographic Details
Main Author: Yu, Yaping
Other Authors: Wang Lipo
Format: Final Year Project
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/73007
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Thousands of genes can be identified by DNA microarray technology at the same time which can have a very large application in biological processes and biomedical study. The knowledge of the micro-array data analysis is gained increasingly, and it is very important and useful for phenotype classification of diseases. Classification techniques is applied for identification and explanation of microarray gene expression data. From a machine learning approach, gene selection is regarded as feature selection. The microarray classification is based on classifying data, and the data are made by many thousands of features. A feature selection algorithm is used for selecting the most significant features, because a large number of features can lead to low prediction accuracy and very high computational complexity. This project explores various feature selection algorithms to determine a smallest set of genes that are responsible for identifying a disease. Micro-array gene expression data plays a very important role in disease diagnoses and prognoses and helps to choose the appropriate treatment plan for patients. Two feature selection algorithms are proposed in this report. We did one feature selection method and did a comparison with another one which have been done by Loris Nanni*, Alessandra Lumini [12]. Using Matlab to do experiment, we aimed to find the smallest gene subsets and get highly accuracy. Finding the smallest gene subsets is very significant. It can reduce the computational burden. We can use the minimum number of gene subsets to get accurate diagnosis. And it can decrease the cost greatly for cancer testing, and reduce the timing for treatment. In simple terms, this project is divided into two steps: to do gene importance ranking, we can get some informative and importance genes. Then we test all possible combinations of important genes through using supper vector machine to get accuracy. All in all, our project can reduce the number of compulsory genes to get faster method of treatment with highly accuracy.