Automating feature engineering in machine Learning

Feature engineering is a vital part of machine learning that transforms massive raw data into the applicable feature set, and many algorithms of feature engineering have been proposed to promote the efficiency and accuracy of this process. Moreover, automated feature engineering has also been resear...

Full description

Saved in:
Bibliographic Details
Main Author: Zhu, Haoyu
Other Authors: Mao Kezhi
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/141857
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Feature engineering is a vital part of machine learning that transforms massive raw data into the applicable feature set, and many algorithms of feature engineering have been proposed to promote the efficiency and accuracy of this process. Moreover, automated feature engineering has also been researched. For example, the AutoLearn feature learning algorithm automatedly generates new features by the linear relationship among features and then performs feature selection process on the new feature set [1]. This project is focusing on feature engineering, specifically feature selection, to develop a unified function that integrates different types of feature selection methods for the automation by hyperparameter optimization. The goals of this project are to utilize a unified function that can implement all types of feature selection algorithms and to automate this process as many hyperparameters could be tuned automatically. And explorations on feature selection algorithms and classification models were carried for designing the unified function. Automated feature selection was also tested on wrapper methods by utilizing the Bayesian optimization method. As the deliverable of this project, the unified function was capable of implementing any kind of feature selection algorithm and outputting feature set that potentially can improve the classification performance. Experiments using the unified function on different datasets showed the potentials to eradicate manual efforts in selecting a subset of useful features.