HYBRID SVM-RF TO IMPROVE THE PERFORMANCE OF RFE FEATURE SELECTION IN SENTIMENT ANALYSIS

Product reviews from other buyers on e-commerce can include whether the product is suitable for sensitive skin, ideal for specific body types, and much more. The number of reviews on one product could have reached thousands of reviews with various review contents. That, of course, can have a nega...

Full description

Saved in:
Bibliographic Details
Main Author: Kayla Amory, Talitha
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/81892
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Product reviews from other buyers on e-commerce can include whether the product is suitable for sensitive skin, ideal for specific body types, and much more. The number of reviews on one product could have reached thousands of reviews with various review contents. That, of course, can have a negative impact on product sellers and buyers because there is not enough time to read and analyse all the reviews on the product. Therefore, building a system that can analyse these reviews is necessary. Since review data can run into the thousands, choosing the optimal features for the best performance when performing sentiment analysis is essential. Data also tend to have high dimensions making it challenging to process and result in overfitting. Dimensional reduction can be done to overcome this, one of which is by using the feature selection method. One of the feature selection methods is Recursive Feature Selection (RFE). RFE will eliminate features with a weak feature importance value where the calculation of feature importance in RFE depends on the estimator used. It causes feature selection RFE when using one estimator will produce different values when using other estimators. Sometimes RFE eliminates features that, with different estimators, have good feature importance values because they use an estimator that results in a feature importance value that is not optimal. This study proposes to apply the hybrid RFE method, in which the estimators will be combined to produce a combined feature importance, which the RFE will then process to improve RFE performance when eliminating weak features and improving the performance of the sentiment analysis system. This study proposes to combine Support Vector Machine (SVM) and Random Forest (RF) estimators. Using the SVMRF-RFE hybrid proposed method can generally improve performance results on sentiment analysis compared to other baseline methods.