HYBRID SVM-RF TO IMPROVE THE PERFORMANCE OF RFE FEATURE SELECTION IN SENTIMENT ANALYSIS
Product reviews from other buyers on e-commerce can include whether the product is suitable for sensitive skin, ideal for specific body types, and much more. The number of reviews on one product could have reached thousands of reviews with various review contents. That, of course, can have a nega...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/81892 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:81892 |
---|---|
spelling |
id-itb.:818922024-07-05T03:56:35ZHYBRID SVM-RF TO IMPROVE THE PERFORMANCE OF RFE FEATURE SELECTION IN SENTIMENT ANALYSIS Kayla Amory, Talitha Indonesia Theses feature selection, hybrid, random forest, recursive feature elimination. sentiment analysis, support vector machine. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/81892 Product reviews from other buyers on e-commerce can include whether the product is suitable for sensitive skin, ideal for specific body types, and much more. The number of reviews on one product could have reached thousands of reviews with various review contents. That, of course, can have a negative impact on product sellers and buyers because there is not enough time to read and analyse all the reviews on the product. Therefore, building a system that can analyse these reviews is necessary. Since review data can run into the thousands, choosing the optimal features for the best performance when performing sentiment analysis is essential. Data also tend to have high dimensions making it challenging to process and result in overfitting. Dimensional reduction can be done to overcome this, one of which is by using the feature selection method. One of the feature selection methods is Recursive Feature Selection (RFE). RFE will eliminate features with a weak feature importance value where the calculation of feature importance in RFE depends on the estimator used. It causes feature selection RFE when using one estimator will produce different values when using other estimators. Sometimes RFE eliminates features that, with different estimators, have good feature importance values because they use an estimator that results in a feature importance value that is not optimal. This study proposes to apply the hybrid RFE method, in which the estimators will be combined to produce a combined feature importance, which the RFE will then process to improve RFE performance when eliminating weak features and improving the performance of the sentiment analysis system. This study proposes to combine Support Vector Machine (SVM) and Random Forest (RF) estimators. Using the SVMRF-RFE hybrid proposed method can generally improve performance results on sentiment analysis compared to other baseline methods. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Product reviews from other buyers on e-commerce can include whether the product is
suitable for sensitive skin, ideal for specific body types, and much more. The number
of reviews on one product could have reached thousands of reviews with various review
contents. That, of course, can have a negative impact on product sellers and buyers
because there is not enough time to read and analyse all the reviews on the product.
Therefore, building a system that can analyse these reviews is necessary. Since review
data can run into the thousands, choosing the optimal features for the best performance
when performing sentiment analysis is essential. Data also tend to have high
dimensions making it challenging to process and result in overfitting. Dimensional
reduction can be done to overcome this, one of which is by using the feature selection
method. One of the feature selection methods is Recursive Feature Selection (RFE).
RFE will eliminate features with a weak feature importance value where the calculation
of feature importance in RFE depends on the estimator used. It causes feature selection
RFE when using one estimator will produce different values when using other
estimators. Sometimes RFE eliminates features that, with different estimators, have
good feature importance values because they use an estimator that results in a feature
importance value that is not optimal. This study proposes to apply the hybrid RFE
method, in which the estimators will be combined to produce a combined feature
importance, which the RFE will then process to improve RFE performance when
eliminating weak features and improving the performance of the sentiment analysis
system. This study proposes to combine Support Vector Machine (SVM) and Random
Forest (RF) estimators. Using the SVMRF-RFE hybrid proposed method can generally
improve performance results on sentiment analysis compared to other baseline
methods. |
format |
Theses |
author |
Kayla Amory, Talitha |
spellingShingle |
Kayla Amory, Talitha HYBRID SVM-RF TO IMPROVE THE PERFORMANCE OF RFE FEATURE SELECTION IN SENTIMENT ANALYSIS |
author_facet |
Kayla Amory, Talitha |
author_sort |
Kayla Amory, Talitha |
title |
HYBRID SVM-RF TO IMPROVE THE PERFORMANCE OF RFE FEATURE SELECTION IN SENTIMENT ANALYSIS |
title_short |
HYBRID SVM-RF TO IMPROVE THE PERFORMANCE OF RFE FEATURE SELECTION IN SENTIMENT ANALYSIS |
title_full |
HYBRID SVM-RF TO IMPROVE THE PERFORMANCE OF RFE FEATURE SELECTION IN SENTIMENT ANALYSIS |
title_fullStr |
HYBRID SVM-RF TO IMPROVE THE PERFORMANCE OF RFE FEATURE SELECTION IN SENTIMENT ANALYSIS |
title_full_unstemmed |
HYBRID SVM-RF TO IMPROVE THE PERFORMANCE OF RFE FEATURE SELECTION IN SENTIMENT ANALYSIS |
title_sort |
hybrid svm-rf to improve the performance of rfe feature selection in sentiment analysis |
url |
https://digilib.itb.ac.id/gdl/view/81892 |
_version_ |
1822997482823483392 |