Automating feature engineering in machine Learning
Feature engineering is a vital part of machine learning that transforms massive raw data into the applicable feature set, and many algorithms of feature engineering have been proposed to promote the efficiency and accuracy of this process. Moreover, automated feature engineering has also been resear...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/141857 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Feature engineering is a vital part of machine learning that transforms massive raw data into the applicable feature set, and many algorithms of feature engineering have been proposed to promote the efficiency and accuracy of this process. Moreover, automated feature engineering has also been researched. For example, the AutoLearn feature learning algorithm automatedly generates new features by the linear relationship among features and then performs feature selection process on the new feature set [1]. This project is focusing on feature engineering, specifically feature selection, to develop a unified function that integrates different types of feature selection methods for the automation by hyperparameter optimization. The goals of this project are to utilize a unified function that can implement all types of feature selection algorithms and to automate this process as many hyperparameters could be tuned automatically. And explorations on feature selection algorithms and classification models were carried for designing the unified function. Automated feature selection was also tested on wrapper methods by utilizing the Bayesian optimization method. As the deliverable of this project, the unified function was capable of implementing any kind of feature selection algorithm and outputting feature set that potentially can improve the classification performance. Experiments using the unified function on different datasets showed the potentials to eradicate manual efforts in selecting a subset of useful features. |
---|