Oblique decision tree ensemble via twin bounded SVM
Ensemble methods with “perturb and combine” strategy have shown improved performance in the classification problems. Recently, random forest algorithm was ranked one among 179 classifiers evaluated on 121 UCI datasets. Motivated by this, we propose a new approach for the generation of oblique decisi...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/161155 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Ensemble methods with “perturb and combine” strategy have shown improved performance in the classification problems. Recently, random forest algorithm was ranked one among 179 classifiers evaluated on 121 UCI datasets. Motivated by this, we propose a new approach for the generation of oblique decision trees. At each non-leaf node, the training data samples are grouped in two categories based on the Bhattachrayya distance with randomly selected feature subset. Then, twin bounded support vector machine (TBSVM) is used to get two clustering hyperplanes such that each hyperplane is closer to data points of one group and as far as possible from the data points of other group. Based on these hyperplanes, each non-leaf node is splitted to generate the decision tree. In this paper, we used different base models like random forest (RaF), rotation forest (RoF), random sub rotation forest (RRoF) to generate the different oblique decision tree forests named as TBRaF, TBRoF and TBRRoF, respectively. In earlier oblique decision trees, like multisurface proximal support vector machine (MPSVM) based oblique decision trees, matrices are semi-positive definite and hence different regularization methods are required. However, no explicit regularization techniques need to be applied to the primal problems as the matrices in the proposed TBRaF, TBRoF and TBRRoF are positive definite. We evaluated the performance of the proposed models (TBRaF, TBRoF and TBRRoF) on 49 datasets taken from the UCI repository and on some real-world biological datasets (not in UCI). The experimental results and statistical tests conducted show that TBRaF and TBRRoF outperform other baseline methods. |
---|