A modified weighted support vector machine (WSVM) to reduce noise data in classification problem
Classification refers to a predictive modeling problem where a class label is predicted for a given example of input data. Data is everywhere and the amount of digital data that exists is growing exponentially. However, data is rarely perfect and there are many inconsistencies that affect data qu...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English English |
Published: |
2021
|
Subjects: | |
Online Access: | http://eprints.uthm.edu.my/8496/1/24p%20SYARIZUL%20AMRI%20MOHD%20DZULKIFLI.pdf http://eprints.uthm.edu.my/8496/2/SYARIZUL%20AMRI%20MOHD%20DZULKIFLI%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/8496/3/SYARIZUL%20AMRI%20MOHD%20DZULKIFLI%20WATERMARK.pdf http://eprints.uthm.edu.my/8496/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Tun Hussein Onn Malaysia |
Language: | English English English |
Summary: | Classification refers to a predictive modeling problem where a class label is predicted
for a given example of input data. Data is everywhere and the amount of digital data
that exists is growing exponentially. However, data is rarely perfect and there are many
inconsistencies that affect data quality such as noise data. Nowadays, the use of SVM
is very perspective for the big data classification. SVM provides a global solution for
data classification but SVM is highly sensitive to noise data and may not be effective
when the level of noise data is high. When noise exists in training data, the decision
boundary of SVM would deviate from the optimal hyperplane severely. To overcome
SVM drawback for noise data problem, WSVM using KPCM algorithm was used but
WSVM using kernel-based learning algorithm such as KPCM algorithm suffer from
training complexity, expensive computation time and storage memory when noise data
contaminate training data. Thus, through a simple pruning and speed-up method such
as clustering method, WKM-SVM has been proposed. However, WKM-SVM has
several limitations that are related to k-Means Clustering. One of the limitations of
WKM-SVM is the clustering centers may not suitably represent original data
structures which can potentially cause poor prediction results. Therefore, this research
work proposes a modified WSVM utilized with instance selection method and
weighted learning to improve WSVM training and classification accuracy. The
modification of WSVM will reduce noise data by producing multiple hyperplanes and
selecting the optimal hyperplane based on the lowest noise data. The overall result
shows that the proposed method outperforms WSVM, OWSVM and WKM-SVM in
all datasets in terms of classification accuracy. Specifically, the proposed method
produces classification accuracy equal to or higher than 85% for three datasets and
lower than 85% for six datasets. However, the performance of the proposed method
for test data may not be as good as anticipated since most of the datasets produced
classification accuracy lower than 85%. |
---|