ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS

Aspect-level sentiment analysis is able to obtain more detailed information compared to sentiment analysis at the level of documents or sentences, i.e. information on aspect categories and sentiments contained in the review text. There are three tasks in sentiment analysis at the aspect level, i.e....

Full description

Saved in:
Bibliographic Details
Main Author: Nurul Azhar, Annisa
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/40005
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Aspect-level sentiment analysis is able to obtain more detailed information compared to sentiment analysis at the level of documents or sentences, i.e. information on aspect categories and sentiments contained in the review text. There are three tasks in sentiment analysis at the aspect level, i.e. aspect categorization, aspect extraction, and sentiment classification. This final project focuses on aspect categorization and sentiment classification for Indonesian . Aspect categorization is multilabel classification problem while the sentiment classification is binary class classification problem. The dataset used in this final project consists of 9450 hotel reviews as training data and 509 9450 hotel reviews as test data. There are 10 categories of aspects considered in this final project. Meanwhile, the polarity of the sentiments considered is positive and negative. To address aspect categorization and sentiment classification in this final project, the Convolutional Neural Network (CNN)-Extreme Gradient Boosting (XGBoost) technique was used as in the Ren, et al. (2017) about multiclass classification for image. The CNN topology that is used for the construction of the CNN-XGBoost model refers to the CNN topology in Chen et al. (2017) about multilabel text classification. The feature used is a lexical feature that is represented by word embedding. The selected baseline model is vanilla CNN, CNN-Support Vector Machine (SVM), and CNN-Long Short-Term Memory (LSTM). The multilabel classification strategy used is either binary relevance or classifier chain. The best combination of CNN parameters (number of filters, window sizes, activation functions, and dense units) for aspect categorization based on experimental results is 128, [2,3,4], ReLU, and 128. Meanwhile, for XGBoost hyperparameter combinations (the best learning rate, minimum height, minimum child weight, gamma, column sample by tree based on the experimental results are 0.2, 3, 1, 0, and 0.7. The test results in F1-measure for aspect categorization tasks, sentiment classification tasks, and combinations are respectively 0.9217, 0.9690, and 0.7274. The model with the proposed technique is able to exceed the performance of all baseline models in the aspect categorization. Meanwhile, for sentiment classification the performances of the proposed method for several aspect categories are still lower than the baselines (vanilla CNN, CNN-SVM, and CNN-LSTM).