Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data

The induction error in random tree ensembling results mainly from the strength of decision trees and the dependency between base classifiers. In order to reduce the errors due to both factors, a Semi-Random Decision Tree Ensembling (SRDTE) for mining streaming data is proposed based on our previous...

Full description

Saved in:

Bibliographic Details
Main Authors:	LI, Peipei, LIANG, Qianhui (Althea), WU, Xindong, Hu, X.
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2009
Subjects:	Random decision trees - data streams - parameter estimation Computer Sciences
Online Access:	https://ink.library.smu.edu.sg/sis_research/454 http://dx.doi.org/10.1007/978-3-642-01307-2_35
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-1453
record_format	dspace
spelling	sg-smu-ink.sis_research-14532010-09-24T06:36:22Z Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data LI, Peipei LIANG, Qianhui (Althea) WU, Xindong Hu, X. The induction error in random tree ensembling results mainly from the strength of decision trees and the dependency between base classifiers. In order to reduce the errors due to both factors, a Semi-Random Decision Tree Ensembling (SRDTE) for mining streaming data is proposed based on our previous work on SRMTDS. The model contains semi-random decision trees that are independent in the generation process and have no interaction with each other in the individual decisions of classification. The main idea is to minimize correlation among the classifiers. We claim that the strength of decision trees is closely related to the estimation values of the parameters, including the height of a tree, the count of trees and the parameter of n min in the Hoeffding Bounds. We analyze these parameters of the model and design strategies for better adaptation to streaming data. The main strategies include an incremental generation of sub-trees after seeing real training instances, a data structure for quick search and a voting mechanism for classification. Our evaluation in the 0-1 loss function shows that SRDTE has improved the performance in terms of predictive accuracy and robustness. We have applied SRDTE to e-business data streams and proved its feasibility and effectiveness. 2009-03-01T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/454 info:doi/10.1007/978-3-642-01307-2_35 http://dx.doi.org/10.1007/978-3-642-01307-2_35 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Random decision trees - data streams - parameter estimation Computer Sciences
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Random decision trees - data streams - parameter estimation Computer Sciences
spellingShingle	Random decision trees - data streams - parameter estimation Computer Sciences LI, Peipei LIANG, Qianhui (Althea) WU, Xindong Hu, X. Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data
description	The induction error in random tree ensembling results mainly from the strength of decision trees and the dependency between base classifiers. In order to reduce the errors due to both factors, a Semi-Random Decision Tree Ensembling (SRDTE) for mining streaming data is proposed based on our previous work on SRMTDS. The model contains semi-random decision trees that are independent in the generation process and have no interaction with each other in the individual decisions of classification. The main idea is to minimize correlation among the classifiers. We claim that the strength of decision trees is closely related to the estimation values of the parameters, including the height of a tree, the count of trees and the parameter of n min in the Hoeffding Bounds. We analyze these parameters of the model and design strategies for better adaptation to streaming data. The main strategies include an incremental generation of sub-trees after seeing real training instances, a data structure for quick search and a voting mechanism for classification. Our evaluation in the 0-1 loss function shows that SRDTE has improved the performance in terms of predictive accuracy and robustness. We have applied SRDTE to e-business data streams and proved its feasibility and effectiveness.
format	text
author	LI, Peipei LIANG, Qianhui (Althea) WU, Xindong Hu, X.
author_facet	LI, Peipei LIANG, Qianhui (Althea) WU, Xindong Hu, X.
author_sort	LI, Peipei
title	Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data
title_short	Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data
title_full	Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data
title_fullStr	Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data
title_full_unstemmed	Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data
title_sort	parameter estimation in semi-random decision tree ensembling on streaming data
publisher	Institutional Knowledge at Singapore Management University
publishDate	2009
url	https://ink.library.smu.edu.sg/sis_research/454 http://dx.doi.org/10.1007/978-3-642-01307-2_35
_version_	1770570431142559744

Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data

Similar Items