The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling

The redistribution of the target class by oversampling synthetic minority instances is one of the effective directions for class imbalance problem. Safe-level SMOTE generates synthetic minority instances around original instances while avoiding nearby majority ones. However, despite of this intentio...

Full description

Saved in:
Bibliographic Details
Main Authors: Wacharasak Siriseriwan, Krung Sinapiromsaran
Language:English
Published: Science Faculty of Chiang Mai University 2019
Subjects:
Online Access:http://it.science.cmu.ac.th/ejournal/dl.php?journal_id=6324
http://cmuir.cmu.ac.th/jspui/handle/6653943832/66081
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
Language: English
id th-cmuir.6653943832-66081
record_format dspace
spelling th-cmuir.6653943832-660812019-08-21T09:18:21Z The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling Wacharasak Siriseriwan Krung Sinapiromsaran class imbalance problem oversampling SMOTE Safe-level SMOTE minority outcast handling The redistribution of the target class by oversampling synthetic minority instances is one of the effective directions for class imbalance problem. Safe-level SMOTE generates synthetic minority instances around original instances while avoiding nearby majority ones. However, despite of this intention, it is still possible that some synthetic instances can be placed too close to nearby majority instances which possibly confuse some classifiers. Moreover, Safe-Level SMOTE technically avoids using minority outcast instances for generating synthetic instances. This generated dataset may lose some precious information of minority class. Our paper aims to remedy these two drawbacks of Safe-Level SMOTE by combining two processes. The first one is checking and moving these synthetic instances away from possibly surrounding majority instances. The second is handling minority outcast with 1-nearest neighbor model. The empirical results on UCI and PROMISE datasets show the improvements of F-measure, which is the performance measure used in the class imbalance problem, for various classifiers such as decision tree, naïve Bayes classifier, multilayer perceptron, support vector machine and K-nearest neighbor. The improvements are tested by Wilcoxon sign test to show its significance. 2019-08-21T09:18:21Z 2019-08-21T09:18:21Z 2016 Chiang Mai Journal of Science 43, 1 (Jan 2016), 234 - 246 0125-2526 http://it.science.cmu.ac.th/ejournal/dl.php?journal_id=6324 http://cmuir.cmu.ac.th/jspui/handle/6653943832/66081 Eng Science Faculty of Chiang Mai University
institution Chiang Mai University
building Chiang Mai University Library
country Thailand
collection CMU Intellectual Repository
language English
topic class imbalance problem
oversampling
SMOTE
Safe-level SMOTE
minority outcast handling
spellingShingle class imbalance problem
oversampling
SMOTE
Safe-level SMOTE
minority outcast handling
Wacharasak Siriseriwan
Krung Sinapiromsaran
The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling
description The redistribution of the target class by oversampling synthetic minority instances is one of the effective directions for class imbalance problem. Safe-level SMOTE generates synthetic minority instances around original instances while avoiding nearby majority ones. However, despite of this intention, it is still possible that some synthetic instances can be placed too close to nearby majority instances which possibly confuse some classifiers. Moreover, Safe-Level SMOTE technically avoids using minority outcast instances for generating synthetic instances. This generated dataset may lose some precious information of minority class. Our paper aims to remedy these two drawbacks of Safe-Level SMOTE by combining two processes. The first one is checking and moving these synthetic instances away from possibly surrounding majority instances. The second is handling minority outcast with 1-nearest neighbor model. The empirical results on UCI and PROMISE datasets show the improvements of F-measure, which is the performance measure used in the class imbalance problem, for various classifiers such as decision tree, naïve Bayes classifier, multilayer perceptron, support vector machine and K-nearest neighbor. The improvements are tested by Wilcoxon sign test to show its significance.
author Wacharasak Siriseriwan
Krung Sinapiromsaran
author_facet Wacharasak Siriseriwan
Krung Sinapiromsaran
author_sort Wacharasak Siriseriwan
title The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling
title_short The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling
title_full The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling
title_fullStr The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling
title_full_unstemmed The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling
title_sort effective redistribution for imbalance dataset : relocating safe-level smote with minority outcast handling
publisher Science Faculty of Chiang Mai University
publishDate 2019
url http://it.science.cmu.ac.th/ejournal/dl.php?journal_id=6324
http://cmuir.cmu.ac.th/jspui/handle/6653943832/66081
_version_ 1681426388610973696