Oblique decision tree ensemble via twin bounded SVM

Ensemble methods with “perturb and combine” strategy have shown improved performance in the classification problems. Recently, random forest algorithm was ranked one among 179 classifiers evaluated on 121 UCI datasets. Motivated by this, we propose a new approach for the generation of oblique decisi...

Full description

Saved in:
Bibliographic Details
Main Authors: Ganaie, M. A., Tanveer, M., Suganthan, Ponnuthurai Nagaratnam
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/161155
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Ensemble methods with “perturb and combine” strategy have shown improved performance in the classification problems. Recently, random forest algorithm was ranked one among 179 classifiers evaluated on 121 UCI datasets. Motivated by this, we propose a new approach for the generation of oblique decision trees. At each non-leaf node, the training data samples are grouped in two categories based on the Bhattachrayya distance with randomly selected feature subset. Then, twin bounded support vector machine (TBSVM) is used to get two clustering hyperplanes such that each hyperplane is closer to data points of one group and as far as possible from the data points of other group. Based on these hyperplanes, each non-leaf node is splitted to generate the decision tree. In this paper, we used different base models like random forest (RaF), rotation forest (RoF), random sub rotation forest (RRoF) to generate the different oblique decision tree forests named as TBRaF, TBRoF and TBRRoF, respectively. In earlier oblique decision trees, like multisurface proximal support vector machine (MPSVM) based oblique decision trees, matrices are semi-positive definite and hence different regularization methods are required. However, no explicit regularization techniques need to be applied to the primal problems as the matrices in the proposed TBRaF, TBRoF and TBRRoF are positive definite. We evaluated the performance of the proposed models (TBRaF, TBRoF and TBRRoF) on 49 datasets taken from the UCI repository and on some real-world biological datasets (not in UCI). The experimental results and statistical tests conducted show that TBRaF and TBRRoF outperform other baseline methods.