Improving Accuracy of Imbalanced Clinical Data Classification Using Synthetic Minority Over-Sampling Technique

Imbalanced datasets typically occur in many real applications. Resampling is one of the effective solutions due to producing a balanced class distribution. Synthetic Minority Over-sampling technique (SMOTE), an over-sampling technique is used in this study for dealing the imbalanced dataset by add...

Full description

Saved in:
Bibliographic Details
Main Authors: Mumtazimah, Mohamad, Mohd, F, Abdul Jalil, M, Noora, N.M.M, Ismail, S, Yahya, W.F.F
Format: Conference or Workshop Item
Language:English
Published: 2019
Subjects:
Online Access:http://eprints.unisza.edu.my/1819/1/FH03-FIK-20-39681.pdf
http://eprints.unisza.edu.my/1819/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Sultan Zainal Abidin
Language: English
Description
Summary:Imbalanced datasets typically occur in many real applications. Resampling is one of the effective solutions due to producing a balanced class distribution. Synthetic Minority Over-sampling technique (SMOTE), an over-sampling technique is used in this study for dealing the imbalanced dataset by add the number of instances of a minority class. This technique is used to decrease the imbalance percentage of the dataset by generating new synthetic samples. Thus, a balanced training dataset is produced to replace the class imbalanced . The balanced datasets were obtained and trained with machine learning algorithms to diagnose the disease’s class. Through the experiment findings on the real-world datasets, oral cancer dataset and erythemato-squamous diseases dataset from the UCI machine learning datasets, an over-sampling method showed better results in clinical disease classification.