Improving sentiment reviews classification performance using support vector machine-fuzzy matching algorithm
High dimensionality in data sets is one of the challenges faced in classification, data mining, and sentiment analysis. In the data set, many dimensionalities require effort to simplify. Many of these dimensionalities have a major impact on the complexity and performance of the algorithms used for c...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IAES
2023
|
Subjects: | |
Online Access: | http://umpir.ump.edu.my/id/eprint/36817/1/Improving%20sentiment%20reviews%20classification%20performance.pdf http://umpir.ump.edu.my/id/eprint/36817/ https://doi.org/10.11591/eei.v12i3.4830 https://doi.org/10.11591/eei.v12i3.4830 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaysia Pahang |
Language: | English |
Summary: | High dimensionality in data sets is one of the challenges faced in classification, data mining, and sentiment analysis. In the data set, many dimensionalities require effort to simplify. Many of these dimensionalities have a major impact on the complexity and performance of the algorithms used for classification. Various challenges were encountered, including how to determine the optimal combination of pre-processing techniques, how to clean the dataset, and determine the best classification algorithm. This study uses a new approach based on the combination of three powerful techniques which are: tokenizing-lowercasing-stemming (for series of preprocessing), support vector machine (SVM) for supervised classification, and fuzzy matching (FM) for dimensionality reduction. The proposed model was realized using 3 different datasets, namely Amazon product review, movie review, and airline review from Twitter. This study provides better findings than the previous results. Improved performance is generated by SVM combined with FM, resulting in 96% accuracy. So that the SVM-FM combination can be said to be the best combination for sentiment analysis on the given data set. |
---|