Hierarchical ensemble learning method in diversified dataset analysis

The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how we...

Full description

Saved in:
Bibliographic Details
Main Authors: Liu, Zeyuan, Li, Xinlong
Other Authors: Nanyang Business School
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/161502
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-161502
record_format dspace
spelling sg-ntu-dr.10356-1615022023-05-19T07:31:19Z Hierarchical ensemble learning method in diversified dataset analysis Liu, Zeyuan Li, Xinlong Nanyang Business School Business::Information technology Categorical Variables Classification Accuracy The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how well the algorithms are. Moreover, the low classification accuracy happened especially when one type of observation's proportion is much lower than the other types in training datasets. The aim of the present study is to design a hierarchical classifier which try to extract new features by ensemble machine learning regressors and statistical methods inside the whole machine learning process. In stage 1, all the categorical variables will be characterized by random forest algorithm to create a new variable through regression analysis while the numerical variables left will serve as the sample of factor analysis (FA) process to calculate the factors value of each observation. Then, all the features will be learned by random forest classifier in stage 2. Diversified datasets consist of categorical and numerical variables will be used in the method. The experiment results show that the classification accuracy increased by 8.61%. Meanwhile, it also improves the classification accuracy of observations with low proportion in the training dataset significantly. Published version 2022-09-06T04:24:27Z 2022-09-06T04:24:27Z 2021 Journal Article Liu, Z. & Li, X. (2021). Hierarchical ensemble learning method in diversified dataset analysis. Journal of Physics: Conference Series, 2078(1), 012027-. https://dx.doi.org/10.1088/1742-6596/2078/1/012027 1742-6588 https://hdl.handle.net/10356/161502 10.1088/1742-6596/2078/1/012027 2-s2.0-85120488752 1 2078 012027 en Journal of Physics: Conference Series © 2021 The Authors. Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Business::Information technology
Categorical Variables
Classification Accuracy
spellingShingle Business::Information technology
Categorical Variables
Classification Accuracy
Liu, Zeyuan
Li, Xinlong
Hierarchical ensemble learning method in diversified dataset analysis
description The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how well the algorithms are. Moreover, the low classification accuracy happened especially when one type of observation's proportion is much lower than the other types in training datasets. The aim of the present study is to design a hierarchical classifier which try to extract new features by ensemble machine learning regressors and statistical methods inside the whole machine learning process. In stage 1, all the categorical variables will be characterized by random forest algorithm to create a new variable through regression analysis while the numerical variables left will serve as the sample of factor analysis (FA) process to calculate the factors value of each observation. Then, all the features will be learned by random forest classifier in stage 2. Diversified datasets consist of categorical and numerical variables will be used in the method. The experiment results show that the classification accuracy increased by 8.61%. Meanwhile, it also improves the classification accuracy of observations with low proportion in the training dataset significantly.
author2 Nanyang Business School
author_facet Nanyang Business School
Liu, Zeyuan
Li, Xinlong
format Article
author Liu, Zeyuan
Li, Xinlong
author_sort Liu, Zeyuan
title Hierarchical ensemble learning method in diversified dataset analysis
title_short Hierarchical ensemble learning method in diversified dataset analysis
title_full Hierarchical ensemble learning method in diversified dataset analysis
title_fullStr Hierarchical ensemble learning method in diversified dataset analysis
title_full_unstemmed Hierarchical ensemble learning method in diversified dataset analysis
title_sort hierarchical ensemble learning method in diversified dataset analysis
publishDate 2022
url https://hdl.handle.net/10356/161502
_version_ 1772825128589066240