HYBRID SVM-RF TO IMPROVE THE PERFORMANCE OF RFE FEATURE SELECTION IN SENTIMENT ANALYSIS
Product reviews from other buyers on e-commerce can include whether the product is suitable for sensitive skin, ideal for specific body types, and much more. The number of reviews on one product could have reached thousands of reviews with various review contents. That, of course, can have a nega...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/81892 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Product reviews from other buyers on e-commerce can include whether the product is
suitable for sensitive skin, ideal for specific body types, and much more. The number
of reviews on one product could have reached thousands of reviews with various review
contents. That, of course, can have a negative impact on product sellers and buyers
because there is not enough time to read and analyse all the reviews on the product.
Therefore, building a system that can analyse these reviews is necessary. Since review
data can run into the thousands, choosing the optimal features for the best performance
when performing sentiment analysis is essential. Data also tend to have high
dimensions making it challenging to process and result in overfitting. Dimensional
reduction can be done to overcome this, one of which is by using the feature selection
method. One of the feature selection methods is Recursive Feature Selection (RFE).
RFE will eliminate features with a weak feature importance value where the calculation
of feature importance in RFE depends on the estimator used. It causes feature selection
RFE when using one estimator will produce different values when using other
estimators. Sometimes RFE eliminates features that, with different estimators, have
good feature importance values because they use an estimator that results in a feature
importance value that is not optimal. This study proposes to apply the hybrid RFE
method, in which the estimators will be combined to produce a combined feature
importance, which the RFE will then process to improve RFE performance when
eliminating weak features and improving the performance of the sentiment analysis
system. This study proposes to combine Support Vector Machine (SVM) and Random
Forest (RF) estimators. Using the SVMRF-RFE hybrid proposed method can generally
improve performance results on sentiment analysis compared to other baseline
methods. |
---|