Winsorize tree algorithm for handling outliers in classification problem

Classification and Regression Tree (CART) is designed to predict or classify the objects in the predetermined classes from a set of predictors. However, having outliers could affect the structures of CART, purity and predictive accuracy in classification. Some researchers opt to perform pre-pruning...

Full description

Saved in:
Bibliographic Details
Main Author: Ch’ng, Chee Keong
Format: Thesis
Language:English
English
Published: 2016
Subjects:
Online Access:https://etd.uum.edu.my/5780/1/depositpermission_s92068.pdf
https://etd.uum.edu.my/5780/14/s92068_01.pdf
https://etd.uum.edu.my/5780/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Utara Malaysia
Language: English
English
id my.uum.etd.5780
record_format eprints
spelling my.uum.etd.57802024-09-21T12:53:12Z https://etd.uum.edu.my/5780/ Winsorize tree algorithm for handling outliers in classification problem Ch’ng, Chee Keong QA273-280 Probabilities. Mathematical statistics Classification and Regression Tree (CART) is designed to predict or classify the objects in the predetermined classes from a set of predictors. However, having outliers could affect the structures of CART, purity and predictive accuracy in classification. Some researchers opt to perform pre-pruning or post-pruning of the CART in handling the outliers. This study proposes a modified classification tree algorithm called Winsorize tree based on the distribution of classes in the training dataset. The Winsorize tree investigates all possible outliers from node to node before checking the potential splitting point to gain the node with the highest purity of the nodes. The upper fence and lower fence of a boxplot are used to detect potential outliers whose values exceeding the tail of Q ± (1.5×Interquartile range). The identified outliers are neutralized using the Winsorize method whilst the Winsorize Gini index is then used to compute the divergences among probability distributions of the target predictor’s values until stopping criteria are met. This study uses three stopping rules: node achieved the minimum 10% of total training set, 2016 Thesis NonPeerReviewed text en https://etd.uum.edu.my/5780/1/depositpermission_s92068.pdf text en https://etd.uum.edu.my/5780/14/s92068_01.pdf Ch’ng, Chee Keong (2016) Winsorize tree algorithm for handling outliers in classification problem. PhD. thesis, Universiti Utara Malaysia.
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Electronic Theses
url_provider http://etd.uum.edu.my/
language English
English
topic QA273-280 Probabilities. Mathematical statistics
spellingShingle QA273-280 Probabilities. Mathematical statistics
Ch’ng, Chee Keong
Winsorize tree algorithm for handling outliers in classification problem
description Classification and Regression Tree (CART) is designed to predict or classify the objects in the predetermined classes from a set of predictors. However, having outliers could affect the structures of CART, purity and predictive accuracy in classification. Some researchers opt to perform pre-pruning or post-pruning of the CART in handling the outliers. This study proposes a modified classification tree algorithm called Winsorize tree based on the distribution of classes in the training dataset. The Winsorize tree investigates all possible outliers from node to node before checking the potential splitting point to gain the node with the highest purity of the nodes. The upper fence and lower fence of a boxplot are used to detect potential outliers whose values exceeding the tail of Q ± (1.5×Interquartile range). The identified outliers are neutralized using the Winsorize method whilst the Winsorize Gini index is then used to compute the divergences among probability distributions of the target predictor’s values until stopping criteria are met. This study uses three stopping rules: node achieved the minimum 10% of total training set,
format Thesis
author Ch’ng, Chee Keong
author_facet Ch’ng, Chee Keong
author_sort Ch’ng, Chee Keong
title Winsorize tree algorithm for handling outliers in classification problem
title_short Winsorize tree algorithm for handling outliers in classification problem
title_full Winsorize tree algorithm for handling outliers in classification problem
title_fullStr Winsorize tree algorithm for handling outliers in classification problem
title_full_unstemmed Winsorize tree algorithm for handling outliers in classification problem
title_sort winsorize tree algorithm for handling outliers in classification problem
publishDate 2016
url https://etd.uum.edu.my/5780/1/depositpermission_s92068.pdf
https://etd.uum.edu.my/5780/14/s92068_01.pdf
https://etd.uum.edu.my/5780/
_version_ 1811688486748553216