Information Theoretic-based Feature Selection for Machine Learning

Three major factors that determine the performance of a machine learning are the choice of a representative set of features, choosing a suitable machine learning algorithm and the right selection of the training parameters for a specified machine learning algorithm. This thesis tackles the proble...

Full description

Saved in:

Bibliographic Details
Main Author:	Muhammad Aliyu, Sulaiman
Format:	Thesis
Language:	English English
Published:	Universiti Malaysia Sarawak (UNIMAS) 2018
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://ir.unimas.my/id/eprint/26595/1/Information%20Theoretic-based%20Feature%2024pgs.pdf http://ir.unimas.my/id/eprint/26595/4/Information%20Theoretic-based%20Feature%20ft.pdf http://ir.unimas.my/id/eprint/26595/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Malaysia Sarawak
Language:	English English

id	my.unimas.ir.26595
record_format	eprints
spelling	my.unimas.ir.265952023-03-23T07:47:33Z http://ir.unimas.my/id/eprint/26595/ Information Theoretic-based Feature Selection for Machine Learning Muhammad Aliyu, Sulaiman QA75 Electronic computers. Computer science Three major factors that determine the performance of a machine learning are the choice of a representative set of features, choosing a suitable machine learning algorithm and the right selection of the training parameters for a specified machine learning algorithm. This thesis tackles the problem of feature selection for supervised machine learning prediction tasks through dependency information. The feature evaluation strategy is formulated based on mutual information (MI) to handles both classification and regression supervised learning tasks and the search strategy is a modified greedy forward strategy designed to manage redundancy between features and avoiding features that are irrelevant to the predicting output. The problem with many existing feature selections that evaluate features based on mutual information is that they are designed to handles classification tasks only. And the few existing ones that can work for regression tasks were recently found to underestimate mutual information between two strongly dependent variables. In addition to these problems, the search strategy which is usually a heuristic greedy method used with many existing feature selections, lacks scientifically sound stopping criterion and the forward greedy procedure despite its advantages over the backward procedure is found to reveal suboptimal. Thus, this thesis has developed and evaluated a filter based Information Theoretic-based Feature Selection (IFS) for machine learning. Various experiments were carried out to assess and test components of IFS algorithm. The first test was designed to evaluate the formulated IFS Selection Criterion Strategy (MI estimator) by comparing it with six different MI estimator benchmarks. The second test evaluates IFS in a controlled study using simulated datasets. Moreover, the third test used ten natural domain datasets obtained from UCI Repository, in about fifteen different experiments, using three to four different Machine Learning Algorithms for performance evaluation. Also, additional experiments to compare the relative performance of the IFS with five related feature selection algorithms were carried out using natural domain datasets. Besides, this thesis developed a hybrid filter method to enhance the performance of the IFS. IFS served as filter together with an Ant Colony Optimization System (ACO) as a metaheuristic form the hybrid system. In these extended IFS method, feature selection method was defined and presented as a 0-1 Knapsack Problem (MKP). Thus, this thesis precisely developed and evaluated IFS_BACS (Binary Ant Colony System) hybrid method. Further experiments were carried out using the natural domain datasets and comparison were made between IFS and hybrid IFS_BACS methods. In most of the cases, experimental results of IFS and its extended IFS_BACS hybrid method significantly reduced features and produce competitive performance accuracy when compared to the results of the full feature set before applying the IFS or IFS_BACS method. And comparing the IFS with its extended version, the extended version (IFS_BACS) seems to be more promising in selecting optimal feature subset from large datasets. Universiti Malaysia Sarawak (UNIMAS) 2018 Thesis NonPeerReviewed text en http://ir.unimas.my/id/eprint/26595/1/Information%20Theoretic-based%20Feature%2024pgs.pdf text en http://ir.unimas.my/id/eprint/26595/4/Information%20Theoretic-based%20Feature%20ft.pdf Muhammad Aliyu, Sulaiman (2018) Information Theoretic-based Feature Selection for Machine Learning. PhD thesis, Universiti Malaysia Sarawak (UNIMAS).
institution	Universiti Malaysia Sarawak
building	Centre for Academic Information Services (CAIS)
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaysia Sarawak
content_source	UNIMAS Institutional Repository
url_provider	http://ir.unimas.my/
language	English English
topic	QA75 Electronic computers. Computer science
spellingShingle	QA75 Electronic computers. Computer science Muhammad Aliyu, Sulaiman Information Theoretic-based Feature Selection for Machine Learning
description	Three major factors that determine the performance of a machine learning are the choice of a representative set of features, choosing a suitable machine learning algorithm and the right selection of the training parameters for a specified machine learning algorithm. This thesis tackles the problem of feature selection for supervised machine learning prediction tasks through dependency information. The feature evaluation strategy is formulated based on mutual information (MI) to handles both classification and regression supervised learning tasks and the search strategy is a modified greedy forward strategy designed to manage redundancy between features and avoiding features that are irrelevant to the predicting output. The problem with many existing feature selections that evaluate features based on mutual information is that they are designed to handles classification tasks only. And the few existing ones that can work for regression tasks were recently found to underestimate mutual information between two strongly dependent variables. In addition to these problems, the search strategy which is usually a heuristic greedy method used with many existing feature selections, lacks scientifically sound stopping criterion and the forward greedy procedure despite its advantages over the backward procedure is found to reveal suboptimal. Thus, this thesis has developed and evaluated a filter based Information Theoretic-based Feature Selection (IFS) for machine learning. Various experiments were carried out to assess and test components of IFS algorithm. The first test was designed to evaluate the formulated IFS Selection Criterion Strategy (MI estimator) by comparing it with six different MI estimator benchmarks. The second test evaluates IFS in a controlled study using simulated datasets. Moreover, the third test used ten natural domain datasets obtained from UCI Repository, in about fifteen different experiments, using three to four different Machine Learning Algorithms for performance evaluation. Also, additional experiments to compare the relative performance of the IFS with five related feature selection algorithms were carried out using natural domain datasets. Besides, this thesis developed a hybrid filter method to enhance the performance of the IFS. IFS served as filter together with an Ant Colony Optimization System (ACO) as a metaheuristic form the hybrid system. In these extended IFS method, feature selection method was defined and presented as a 0-1 Knapsack Problem (MKP). Thus, this thesis precisely developed and evaluated IFS_BACS (Binary Ant Colony System) hybrid method. Further experiments were carried out using the natural domain datasets and comparison were made between IFS and hybrid IFS_BACS methods. In most of the cases, experimental results of IFS and its extended IFS_BACS hybrid method significantly reduced features and produce competitive performance accuracy when compared to the results of the full feature set before applying the IFS or IFS_BACS method. And comparing the IFS with its extended version, the extended version (IFS_BACS) seems to be more promising in selecting optimal feature subset from large datasets.
format	Thesis
author	Muhammad Aliyu, Sulaiman
author_facet	Muhammad Aliyu, Sulaiman
author_sort	Muhammad Aliyu, Sulaiman
title	Information Theoretic-based Feature Selection for Machine Learning
title_short	Information Theoretic-based Feature Selection for Machine Learning
title_full	Information Theoretic-based Feature Selection for Machine Learning
title_fullStr	Information Theoretic-based Feature Selection for Machine Learning
title_full_unstemmed	Information Theoretic-based Feature Selection for Machine Learning
title_sort	information theoretic-based feature selection for machine learning
publisher	Universiti Malaysia Sarawak (UNIMAS)
publishDate	2018
url	http://ir.unimas.my/id/eprint/26595/1/Information%20Theoretic-based%20Feature%2024pgs.pdf http://ir.unimas.my/id/eprint/26595/4/Information%20Theoretic-based%20Feature%20ft.pdf http://ir.unimas.my/id/eprint/26595/
_version_	1761623574118924288

Information Theoretic-based Feature Selection for Machine Learning

Similar Items