HPC-enabled GA-SVM feature selection model for large-scale data

With the explosive growth of data to be processed in multiple areas such as bioinformatics, scientific simulation and e-commence, data mining techniques are essential in making proactive, prudent and knowledge-driven decision. Support vector machine (SVM), pioneered by Vapnik has been chosen in this...

Full description

Saved in:

Bibliographic Details
Main Author:	Tay, Darwin Jia Xian.
Other Authors:	Stephen John Turner
Format:	Final Year Project
Language:	English
Published:	2009
Subjects:	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	http://hdl.handle.net/10356/18903
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-18903
record_format	dspace
spelling	sg-ntu-dr.10356-189032023-03-03T20:25:32Z HPC-enabled GA-SVM feature selection model for large-scale data Tay, Darwin Jia Xian. Stephen John Turner School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence With the explosive growth of data to be processed in multiple areas such as bioinformatics, scientific simulation and e-commence, data mining techniques are essential in making proactive, prudent and knowledge-driven decision. Support vector machine (SVM), pioneered by Vapnik has been chosen in this work as the data mining tool due to its excellent generalization performance. In particular, LibSVM has been selected as the software package to perform classification because of its sound performance and popularity. In this paper, an hybrid model for solving the problem of model selection associated with SVM is proposed. This model, HPC-enabled GA-SVM, takes advantage of genetic algorithm (GA) and high performance computing (HPC) technique like parallelism via OpenMP and MPI to conduct the process of model selection. GA was selected due to its capability of performing effective feature selection while HPC techniques have the capability of enhancing the computational performance. Exploration technique like ‘Uniform Design’ (UD) has also been employed to enhance the performance of the proposed model. A speedup of 29.02 times was achievable when compared to the traditional ‘grid’ search algorithm which is an exhaustive search approach without compromising much accuracy. Moreover, a caching policy known as “relaxed” caching policy has been proposed to avoid re-evaluations of previously evaluated combination that are in vicinity. This allows a speedup of 72.83 times when compared to the ‘grid’ search algorithm. Bachelor of Engineering (Computer Science) 2009-08-17T02:42:23Z 2009-08-17T02:42:23Z 2009 2009 Final Year Project (FYP) http://hdl.handle.net/10356/18903 en Nanyang Technological University 104 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Tay, Darwin Jia Xian. HPC-enabled GA-SVM feature selection model for large-scale data
description	With the explosive growth of data to be processed in multiple areas such as bioinformatics, scientific simulation and e-commence, data mining techniques are essential in making proactive, prudent and knowledge-driven decision. Support vector machine (SVM), pioneered by Vapnik has been chosen in this work as the data mining tool due to its excellent generalization performance. In particular, LibSVM has been selected as the software package to perform classification because of its sound performance and popularity. In this paper, an hybrid model for solving the problem of model selection associated with SVM is proposed. This model, HPC-enabled GA-SVM, takes advantage of genetic algorithm (GA) and high performance computing (HPC) technique like parallelism via OpenMP and MPI to conduct the process of model selection. GA was selected due to its capability of performing effective feature selection while HPC techniques have the capability of enhancing the computational performance. Exploration technique like ‘Uniform Design’ (UD) has also been employed to enhance the performance of the proposed model. A speedup of 29.02 times was achievable when compared to the traditional ‘grid’ search algorithm which is an exhaustive search approach without compromising much accuracy. Moreover, a caching policy known as “relaxed” caching policy has been proposed to avoid re-evaluations of previously evaluated combination that are in vicinity. This allows a speedup of 72.83 times when compared to the ‘grid’ search algorithm.
author2	Stephen John Turner
author_facet	Stephen John Turner Tay, Darwin Jia Xian.
format	Final Year Project
author	Tay, Darwin Jia Xian.
author_sort	Tay, Darwin Jia Xian.
title	HPC-enabled GA-SVM feature selection model for large-scale data
title_short	HPC-enabled GA-SVM feature selection model for large-scale data
title_full	HPC-enabled GA-SVM feature selection model for large-scale data
title_fullStr	HPC-enabled GA-SVM feature selection model for large-scale data
title_full_unstemmed	HPC-enabled GA-SVM feature selection model for large-scale data
title_sort	hpc-enabled ga-svm feature selection model for large-scale data
publishDate	2009
url	http://hdl.handle.net/10356/18903
_version_	1759855839530713088

HPC-enabled GA-SVM feature selection model for large-scale data

Similar Items