Hierarchical ensemble learning method in diversified dataset analysis

The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how we...

Full description

Saved in:

Bibliographic Details
Main Authors:	Liu, Zeyuan, Li, Xinlong
Other Authors:	Nanyang Business School
Format:	Article
Language:	English
Published:	2022
Subjects:	Business::Information technology Categorical Variables Classification Accuracy
Online Access:	https://hdl.handle.net/10356/161502
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-161502
record_format	dspace
spelling	sg-ntu-dr.10356-1615022023-05-19T07:31:19Z Hierarchical ensemble learning method in diversified dataset analysis Liu, Zeyuan Li, Xinlong Nanyang Business School Business::Information technology Categorical Variables Classification Accuracy The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how well the algorithms are. Moreover, the low classification accuracy happened especially when one type of observation's proportion is much lower than the other types in training datasets. The aim of the present study is to design a hierarchical classifier which try to extract new features by ensemble machine learning regressors and statistical methods inside the whole machine learning process. In stage 1, all the categorical variables will be characterized by random forest algorithm to create a new variable through regression analysis while the numerical variables left will serve as the sample of factor analysis (FA) process to calculate the factors value of each observation. Then, all the features will be learned by random forest classifier in stage 2. Diversified datasets consist of categorical and numerical variables will be used in the method. The experiment results show that the classification accuracy increased by 8.61%. Meanwhile, it also improves the classification accuracy of observations with low proportion in the training dataset significantly. Published version 2022-09-06T04:24:27Z 2022-09-06T04:24:27Z 2021 Journal Article Liu, Z. & Li, X. (2021). Hierarchical ensemble learning method in diversified dataset analysis. Journal of Physics: Conference Series, 2078(1), 012027-. https://dx.doi.org/10.1088/1742-6596/2078/1/012027 1742-6588 https://hdl.handle.net/10356/161502 10.1088/1742-6596/2078/1/012027 2-s2.0-85120488752 1 2078 012027 en Journal of Physics: Conference Series © 2021 The Authors. Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Business::Information technology Categorical Variables Classification Accuracy
spellingShingle	Business::Information technology Categorical Variables Classification Accuracy Liu, Zeyuan Li, Xinlong Hierarchical ensemble learning method in diversified dataset analysis
description	The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how well the algorithms are. Moreover, the low classification accuracy happened especially when one type of observation's proportion is much lower than the other types in training datasets. The aim of the present study is to design a hierarchical classifier which try to extract new features by ensemble machine learning regressors and statistical methods inside the whole machine learning process. In stage 1, all the categorical variables will be characterized by random forest algorithm to create a new variable through regression analysis while the numerical variables left will serve as the sample of factor analysis (FA) process to calculate the factors value of each observation. Then, all the features will be learned by random forest classifier in stage 2. Diversified datasets consist of categorical and numerical variables will be used in the method. The experiment results show that the classification accuracy increased by 8.61%. Meanwhile, it also improves the classification accuracy of observations with low proportion in the training dataset significantly.
author2	Nanyang Business School
author_facet	Nanyang Business School Liu, Zeyuan Li, Xinlong
format	Article
author	Liu, Zeyuan Li, Xinlong
author_sort	Liu, Zeyuan
title	Hierarchical ensemble learning method in diversified dataset analysis
title_short	Hierarchical ensemble learning method in diversified dataset analysis
title_full	Hierarchical ensemble learning method in diversified dataset analysis
title_fullStr	Hierarchical ensemble learning method in diversified dataset analysis
title_full_unstemmed	Hierarchical ensemble learning method in diversified dataset analysis
title_sort	hierarchical ensemble learning method in diversified dataset analysis
publishDate	2022
url	https://hdl.handle.net/10356/161502
_version_	1772825128589066240

Hierarchical ensemble learning method in diversified dataset analysis

Similar Items