Adaptive generation-based approaches of oversampling using different sets of base and nearest neighbor's instances

Standard classification algorithms often face a challenge of learning from imbalanced datasets. While several approaches have been employed in addressing this problem, methods that involve oversampling of minority samples remain more widely used in comparison to algorithmic modifications. Most varia...

Full description

Saved in:
Bibliographic Details
Main Authors: Nabus, Hatem S. Y., Ali, Aida, Hassan, Shafaatunnur, Shamsuddin, Siti Mariyam, Mustapha, Ismail B., Saeed, Faisal
Format: Article
Language:English
Published: Science and Information Organization 2022
Subjects:
Online Access:http://eprints.utm.my/id/eprint/100866/1/HatemSYNabus2022_AdaptiveGenerationbasedApproaches.pdf
http://eprints.utm.my/id/eprint/100866/
http://dx.doi.org/10.14569/IJACSA.2022.0130461
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
Language: English
id my.utm.100866
record_format eprints
spelling my.utm.1008662023-05-18T03:44:03Z http://eprints.utm.my/id/eprint/100866/ Adaptive generation-based approaches of oversampling using different sets of base and nearest neighbor's instances Nabus, Hatem S. Y. Ali, Aida Hassan, Shafaatunnur Shamsuddin, Siti Mariyam Mustapha, Ismail B. Saeed, Faisal QA75 Electronic computers. Computer science Standard classification algorithms often face a challenge of learning from imbalanced datasets. While several approaches have been employed in addressing this problem, methods that involve oversampling of minority samples remain more widely used in comparison to algorithmic modifications. Most variants of oversampling are derived from Synthetic Minority Oversampling Technique (SMOTE), which involves generation of synthetic minority samples along a point in the feature space between two minority class instances. The main reasons these variants produce different results lies in (1) the samples they use as initial selection / base samples and the nearest neighbors. (2) Variation in how they handle minority noises. Therefore, this paper presented different combinations of base and nearest neighbor's samples which never used before to monitor their effect in comparison to the standard oversampling techniques. Six methods; three combinations of Only Danger Oversampling (ODO) techniques, and three combinations of Danger Noise Oversampling (DNO) techniques are proposed. The ODO's and DNO's methods use different groups of samples as base and nearest neighbors. While the three ODO's methods do not consider the minority noises, the three DNO's include the minority noises in both the base and neighbor samples. The performances of the proposed methods are compared to that of several standard oversampling algorithms. We present experimental results demonstrating a significant improvement in the recall metric. Science and Information Organization 2022 Article PeerReviewed application/pdf en http://eprints.utm.my/id/eprint/100866/1/HatemSYNabus2022_AdaptiveGenerationbasedApproaches.pdf Nabus, Hatem S. Y. and Ali, Aida and Hassan, Shafaatunnur and Shamsuddin, Siti Mariyam and Mustapha, Ismail B. and Saeed, Faisal (2022) Adaptive generation-based approaches of oversampling using different sets of base and nearest neighbor's instances. International Journal of Advanced Computer Science and Applications, 13 (4). pp. 527-534. ISSN 2158-107X http://dx.doi.org/10.14569/IJACSA.2022.0130461 DOI: 10.14569/IJACSA.2022.0130461
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Nabus, Hatem S. Y.
Ali, Aida
Hassan, Shafaatunnur
Shamsuddin, Siti Mariyam
Mustapha, Ismail B.
Saeed, Faisal
Adaptive generation-based approaches of oversampling using different sets of base and nearest neighbor's instances
description Standard classification algorithms often face a challenge of learning from imbalanced datasets. While several approaches have been employed in addressing this problem, methods that involve oversampling of minority samples remain more widely used in comparison to algorithmic modifications. Most variants of oversampling are derived from Synthetic Minority Oversampling Technique (SMOTE), which involves generation of synthetic minority samples along a point in the feature space between two minority class instances. The main reasons these variants produce different results lies in (1) the samples they use as initial selection / base samples and the nearest neighbors. (2) Variation in how they handle minority noises. Therefore, this paper presented different combinations of base and nearest neighbor's samples which never used before to monitor their effect in comparison to the standard oversampling techniques. Six methods; three combinations of Only Danger Oversampling (ODO) techniques, and three combinations of Danger Noise Oversampling (DNO) techniques are proposed. The ODO's and DNO's methods use different groups of samples as base and nearest neighbors. While the three ODO's methods do not consider the minority noises, the three DNO's include the minority noises in both the base and neighbor samples. The performances of the proposed methods are compared to that of several standard oversampling algorithms. We present experimental results demonstrating a significant improvement in the recall metric.
format Article
author Nabus, Hatem S. Y.
Ali, Aida
Hassan, Shafaatunnur
Shamsuddin, Siti Mariyam
Mustapha, Ismail B.
Saeed, Faisal
author_facet Nabus, Hatem S. Y.
Ali, Aida
Hassan, Shafaatunnur
Shamsuddin, Siti Mariyam
Mustapha, Ismail B.
Saeed, Faisal
author_sort Nabus, Hatem S. Y.
title Adaptive generation-based approaches of oversampling using different sets of base and nearest neighbor's instances
title_short Adaptive generation-based approaches of oversampling using different sets of base and nearest neighbor's instances
title_full Adaptive generation-based approaches of oversampling using different sets of base and nearest neighbor's instances
title_fullStr Adaptive generation-based approaches of oversampling using different sets of base and nearest neighbor's instances
title_full_unstemmed Adaptive generation-based approaches of oversampling using different sets of base and nearest neighbor's instances
title_sort adaptive generation-based approaches of oversampling using different sets of base and nearest neighbor's instances
publisher Science and Information Organization
publishDate 2022
url http://eprints.utm.my/id/eprint/100866/1/HatemSYNabus2022_AdaptiveGenerationbasedApproaches.pdf
http://eprints.utm.my/id/eprint/100866/
http://dx.doi.org/10.14569/IJACSA.2022.0130461
_version_ 1768006577186406400