A neighborhood undersampling stacked ensemble (NUS-SE) in imbalanced classification
Stacked ensemble, which formulates an ensemble by using a meta-learner to combine (stack) the predictions of multiple base classifiers, suffers from the problem of suboptimal performance on imbalanced classification. To improve the classification performance of stacked ensemble on imbalanced dataset...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Published: |
Elsevier
2021
|
Subjects: | |
Online Access: | http://eprints.um.edu.my/27148/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaya |
Summary: | Stacked ensemble, which formulates an ensemble by using a meta-learner to combine (stack) the predictions of multiple base classifiers, suffers from the problem of suboptimal performance on imbalanced classification. To improve the classification performance of stacked ensemble on imbalanced datasets, we proposed a method named Neighborhood Undersampling Stacked Ensemble (NUS-SE) in this paper. In general, the NUS-SE can be broken down into two proposed components, an undersampling based stacked ensemble framework (US-SE) component and an undersampling technique component. In the metadata generation step of stacked ensemble, a cross-validation-like procedure (CV-prediction) is commonly used. Unfortunately, incomplete metadata with missing prediction values is generated when undersampling is performed within a stacked ensemble which utilized CV-prediction as the metadata generation procedure. Therefore, in the proposed US-SE component, we replaced the standard CV-prediction procedure with our proposed method coined as Subset and Out-of-Subset (S-OOS) prediction procedure as the metadata generation method. S-OOS prediction procedure will generate metadata without missing prediction values and thus enabling the integration of undersampling within stacked ensemble. By integrating undersampling within stacked ensemble, multiple undersampled-data-subsets are used in the training of US-SE's base learners. While in the undersampling component, we further proposed a novel undersampling technique - Neighborhood Undersampling (NUS) which selects majority instances based on their local neighborhood information. The performance of the NUS-SE is evaluated against those non-resampling based stacked ensemble as baseline methods. The experiment demonstrates that the proposed NUS-SE, which is an undersampling based stacked ensemble, is capable of achieving a better performance when compared to the non-resampling based stacked ensemble. |
---|