Hierarchical ensemble learning method in diversified dataset analysis
The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how we...
Saved in:
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/161502 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-161502 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1615022023-05-19T07:31:19Z Hierarchical ensemble learning method in diversified dataset analysis Liu, Zeyuan Li, Xinlong Nanyang Business School Business::Information technology Categorical Variables Classification Accuracy The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how well the algorithms are. Moreover, the low classification accuracy happened especially when one type of observation's proportion is much lower than the other types in training datasets. The aim of the present study is to design a hierarchical classifier which try to extract new features by ensemble machine learning regressors and statistical methods inside the whole machine learning process. In stage 1, all the categorical variables will be characterized by random forest algorithm to create a new variable through regression analysis while the numerical variables left will serve as the sample of factor analysis (FA) process to calculate the factors value of each observation. Then, all the features will be learned by random forest classifier in stage 2. Diversified datasets consist of categorical and numerical variables will be used in the method. The experiment results show that the classification accuracy increased by 8.61%. Meanwhile, it also improves the classification accuracy of observations with low proportion in the training dataset significantly. Published version 2022-09-06T04:24:27Z 2022-09-06T04:24:27Z 2021 Journal Article Liu, Z. & Li, X. (2021). Hierarchical ensemble learning method in diversified dataset analysis. Journal of Physics: Conference Series, 2078(1), 012027-. https://dx.doi.org/10.1088/1742-6596/2078/1/012027 1742-6588 https://hdl.handle.net/10356/161502 10.1088/1742-6596/2078/1/012027 2-s2.0-85120488752 1 2078 012027 en Journal of Physics: Conference Series © 2021 The Authors. Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Business::Information technology Categorical Variables Classification Accuracy |
spellingShingle |
Business::Information technology Categorical Variables Classification Accuracy Liu, Zeyuan Li, Xinlong Hierarchical ensemble learning method in diversified dataset analysis |
description |
The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how well the algorithms are. Moreover, the low classification accuracy happened especially when one type of observation's proportion is much lower than the other types in training datasets. The aim of the present study is to design a hierarchical classifier which try to extract new features by ensemble machine learning regressors and statistical methods inside the whole machine learning process. In stage 1, all the categorical variables will be characterized by random forest algorithm to create a new variable through regression analysis while the numerical variables left will serve as the sample of factor analysis (FA) process to calculate the factors value of each observation. Then, all the features will be learned by random forest classifier in stage 2. Diversified datasets consist of categorical and numerical variables will be used in the method. The experiment results show that the classification accuracy increased by 8.61%. Meanwhile, it also improves the classification accuracy of observations with low proportion in the training dataset significantly. |
author2 |
Nanyang Business School |
author_facet |
Nanyang Business School Liu, Zeyuan Li, Xinlong |
format |
Article |
author |
Liu, Zeyuan Li, Xinlong |
author_sort |
Liu, Zeyuan |
title |
Hierarchical ensemble learning method in diversified dataset analysis |
title_short |
Hierarchical ensemble learning method in diversified dataset analysis |
title_full |
Hierarchical ensemble learning method in diversified dataset analysis |
title_fullStr |
Hierarchical ensemble learning method in diversified dataset analysis |
title_full_unstemmed |
Hierarchical ensemble learning method in diversified dataset analysis |
title_sort |
hierarchical ensemble learning method in diversified dataset analysis |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/161502 |
_version_ |
1772825128589066240 |