HYBRID SAMPLING METHOD BASED ON DBSCAN AND PARTICLE SWARM OPTIMIZATION (PSO) FOR IMBALANCED DATA CLASSIFICATION

Imbalanced data refer to data condition whose significant disparity between the number of data points in one class compared to another class. In some cases of imbalanced data, classification algorithms may not accurately predict the minority class even though they achieve high accuracy. However,...

Full description

Saved in:
Bibliographic Details
Main Author: Fauzi, Ihsan
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/78368
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Imbalanced data refer to data condition whose significant disparity between the number of data points in one class compared to another class. In some cases of imbalanced data, classification algorithms may not accurately predict the minority class even though they achieve high accuracy. However, accurate prediction of the minority class is most important, for example in cases of rare medical disease diagnosis where it is crucial to detect the disease. To address the issue of imbalanced data, this research proposes a hybridsampling method that combines the undersampling method proposed by Mirzaei et al. and the oversampling method proposed by Xiaolong et al., where both methods are performed based on density using the DBSCAN algorithm for resampling. However, the DBSCAN algorithm is highly sensitive to the minPts and Eps values, so other research has used Particle Swarm Optimization (PSO) to determine these two parameters. Therefore, the hybridsampling method that proposed in this research uses Particle Swarm Optimization (PSO) to determine the minPts and Eps paramters values in the DBSCAN algorithm used for both undersampling and oversampling.