Feature selection for micro-array data classification

Thousands of genes can be identified by DNA microarray technology at the same time which can have a very large application in biological processes and biomedical study. The knowledge of the micro-array data analysis is gained increasingly, and it is very important and useful for phenotype classifica...

Full description

Saved in:
Bibliographic Details
Main Author: Yu, Yaping
Other Authors: Wang Lipo
Format: Final Year Project
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/73007
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-73007
record_format dspace
spelling sg-ntu-dr.10356-730072023-07-07T17:05:53Z Feature selection for micro-array data classification Yu, Yaping Wang Lipo School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Thousands of genes can be identified by DNA microarray technology at the same time which can have a very large application in biological processes and biomedical study. The knowledge of the micro-array data analysis is gained increasingly, and it is very important and useful for phenotype classification of diseases. Classification techniques is applied for identification and explanation of microarray gene expression data. From a machine learning approach, gene selection is regarded as feature selection. The microarray classification is based on classifying data, and the data are made by many thousands of features. A feature selection algorithm is used for selecting the most significant features, because a large number of features can lead to low prediction accuracy and very high computational complexity. This project explores various feature selection algorithms to determine a smallest set of genes that are responsible for identifying a disease. Micro-array gene expression data plays a very important role in disease diagnoses and prognoses and helps to choose the appropriate treatment plan for patients. Two feature selection algorithms are proposed in this report. We did one feature selection method and did a comparison with another one which have been done by Loris Nanni*, Alessandra Lumini [12]. Using Matlab to do experiment, we aimed to find the smallest gene subsets and get highly accuracy. Finding the smallest gene subsets is very significant. It can reduce the computational burden. We can use the minimum number of gene subsets to get accurate diagnosis. And it can decrease the cost greatly for cancer testing, and reduce the timing for treatment. In simple terms, this project is divided into two steps: to do gene importance ranking, we can get some informative and importance genes. Then we test all possible combinations of important genes through using supper vector machine to get accuracy. All in all, our project can reduce the number of compulsory genes to get faster method of treatment with highly accuracy. Bachelor of Engineering 2017-12-19T04:19:31Z 2017-12-19T04:19:31Z 2017 Final Year Project (FYP) http://hdl.handle.net/10356/73007 en Nanyang Technological University 71 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering
spellingShingle DRNTU::Engineering::Electrical and electronic engineering
Yu, Yaping
Feature selection for micro-array data classification
description Thousands of genes can be identified by DNA microarray technology at the same time which can have a very large application in biological processes and biomedical study. The knowledge of the micro-array data analysis is gained increasingly, and it is very important and useful for phenotype classification of diseases. Classification techniques is applied for identification and explanation of microarray gene expression data. From a machine learning approach, gene selection is regarded as feature selection. The microarray classification is based on classifying data, and the data are made by many thousands of features. A feature selection algorithm is used for selecting the most significant features, because a large number of features can lead to low prediction accuracy and very high computational complexity. This project explores various feature selection algorithms to determine a smallest set of genes that are responsible for identifying a disease. Micro-array gene expression data plays a very important role in disease diagnoses and prognoses and helps to choose the appropriate treatment plan for patients. Two feature selection algorithms are proposed in this report. We did one feature selection method and did a comparison with another one which have been done by Loris Nanni*, Alessandra Lumini [12]. Using Matlab to do experiment, we aimed to find the smallest gene subsets and get highly accuracy. Finding the smallest gene subsets is very significant. It can reduce the computational burden. We can use the minimum number of gene subsets to get accurate diagnosis. And it can decrease the cost greatly for cancer testing, and reduce the timing for treatment. In simple terms, this project is divided into two steps: to do gene importance ranking, we can get some informative and importance genes. Then we test all possible combinations of important genes through using supper vector machine to get accuracy. All in all, our project can reduce the number of compulsory genes to get faster method of treatment with highly accuracy.
author2 Wang Lipo
author_facet Wang Lipo
Yu, Yaping
format Final Year Project
author Yu, Yaping
author_sort Yu, Yaping
title Feature selection for micro-array data classification
title_short Feature selection for micro-array data classification
title_full Feature selection for micro-array data classification
title_fullStr Feature selection for micro-array data classification
title_full_unstemmed Feature selection for micro-array data classification
title_sort feature selection for micro-array data classification
publishDate 2017
url http://hdl.handle.net/10356/73007
_version_ 1772827646328045568