The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling
The redistribution of the target class by oversampling synthetic minority instances is one of the effective directions for class imbalance problem. Safe-level SMOTE generates synthetic minority instances around original instances while avoiding nearby majority ones. However, despite of this intentio...
Saved in:
Main Authors: | , |
---|---|
Language: | English |
Published: |
Science Faculty of Chiang Mai University
2019
|
Subjects: | |
Online Access: | http://it.science.cmu.ac.th/ejournal/dl.php?journal_id=6324 http://cmuir.cmu.ac.th/jspui/handle/6653943832/66081 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Chiang Mai University |
Language: | English |
id |
th-cmuir.6653943832-66081 |
---|---|
record_format |
dspace |
spelling |
th-cmuir.6653943832-660812019-08-21T09:18:21Z The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling Wacharasak Siriseriwan Krung Sinapiromsaran class imbalance problem oversampling SMOTE Safe-level SMOTE minority outcast handling The redistribution of the target class by oversampling synthetic minority instances is one of the effective directions for class imbalance problem. Safe-level SMOTE generates synthetic minority instances around original instances while avoiding nearby majority ones. However, despite of this intention, it is still possible that some synthetic instances can be placed too close to nearby majority instances which possibly confuse some classifiers. Moreover, Safe-Level SMOTE technically avoids using minority outcast instances for generating synthetic instances. This generated dataset may lose some precious information of minority class. Our paper aims to remedy these two drawbacks of Safe-Level SMOTE by combining two processes. The first one is checking and moving these synthetic instances away from possibly surrounding majority instances. The second is handling minority outcast with 1-nearest neighbor model. The empirical results on UCI and PROMISE datasets show the improvements of F-measure, which is the performance measure used in the class imbalance problem, for various classifiers such as decision tree, naïve Bayes classifier, multilayer perceptron, support vector machine and K-nearest neighbor. The improvements are tested by Wilcoxon sign test to show its significance. 2019-08-21T09:18:21Z 2019-08-21T09:18:21Z 2016 Chiang Mai Journal of Science 43, 1 (Jan 2016), 234 - 246 0125-2526 http://it.science.cmu.ac.th/ejournal/dl.php?journal_id=6324 http://cmuir.cmu.ac.th/jspui/handle/6653943832/66081 Eng Science Faculty of Chiang Mai University |
institution |
Chiang Mai University |
building |
Chiang Mai University Library |
country |
Thailand |
collection |
CMU Intellectual Repository |
language |
English |
topic |
class imbalance problem oversampling SMOTE Safe-level SMOTE minority outcast handling |
spellingShingle |
class imbalance problem oversampling SMOTE Safe-level SMOTE minority outcast handling Wacharasak Siriseriwan Krung Sinapiromsaran The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling |
description |
The redistribution of the target class by oversampling synthetic minority instances is one of the effective directions for class imbalance problem. Safe-level SMOTE generates synthetic minority instances around original instances while avoiding nearby majority ones. However, despite of this intention, it is still possible that some synthetic instances can be placed too close to nearby majority instances which possibly confuse some classifiers. Moreover, Safe-Level SMOTE technically avoids using minority outcast instances for generating synthetic instances. This generated dataset may lose some precious information of minority class. Our paper aims to remedy these two drawbacks of Safe-Level SMOTE by combining two processes. The first one is checking and moving these synthetic instances away from possibly surrounding majority instances. The second is handling minority outcast with 1-nearest neighbor model. The empirical results on UCI and PROMISE datasets show the improvements of F-measure, which is the performance measure used in the class imbalance problem, for various classifiers such as decision tree, naïve Bayes classifier, multilayer perceptron, support vector machine and K-nearest neighbor. The improvements are tested by Wilcoxon sign test to show its significance. |
author |
Wacharasak Siriseriwan Krung Sinapiromsaran |
author_facet |
Wacharasak Siriseriwan Krung Sinapiromsaran |
author_sort |
Wacharasak Siriseriwan |
title |
The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling |
title_short |
The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling |
title_full |
The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling |
title_fullStr |
The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling |
title_full_unstemmed |
The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling |
title_sort |
effective redistribution for imbalance dataset : relocating safe-level smote with minority outcast handling |
publisher |
Science Faculty of Chiang Mai University |
publishDate |
2019 |
url |
http://it.science.cmu.ac.th/ejournal/dl.php?journal_id=6324 http://cmuir.cmu.ac.th/jspui/handle/6653943832/66081 |
_version_ |
1681426388610973696 |