ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS

Aspect-level sentiment analysis is able to obtain more detailed information compared to sentiment analysis at the level of documents or sentences, i.e. information on aspect categories and sentiments contained in the review text. There are three tasks in sentiment analysis at the aspect level, i.e....

Full description

Saved in:
Bibliographic Details
Main Author: Nurul Azhar, Annisa
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/40005
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:40005
spelling id-itb.:400052019-06-28T15:23:39ZASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS Nurul Azhar, Annisa Indonesia Final Project aspect categorization, sentiment classification, multilabel classification, single label classification, convolutional neural network, extreme gradient boosting INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/40005 Aspect-level sentiment analysis is able to obtain more detailed information compared to sentiment analysis at the level of documents or sentences, i.e. information on aspect categories and sentiments contained in the review text. There are three tasks in sentiment analysis at the aspect level, i.e. aspect categorization, aspect extraction, and sentiment classification. This final project focuses on aspect categorization and sentiment classification for Indonesian . Aspect categorization is multilabel classification problem while the sentiment classification is binary class classification problem. The dataset used in this final project consists of 9450 hotel reviews as training data and 509 9450 hotel reviews as test data. There are 10 categories of aspects considered in this final project. Meanwhile, the polarity of the sentiments considered is positive and negative. To address aspect categorization and sentiment classification in this final project, the Convolutional Neural Network (CNN)-Extreme Gradient Boosting (XGBoost) technique was used as in the Ren, et al. (2017) about multiclass classification for image. The CNN topology that is used for the construction of the CNN-XGBoost model refers to the CNN topology in Chen et al. (2017) about multilabel text classification. The feature used is a lexical feature that is represented by word embedding. The selected baseline model is vanilla CNN, CNN-Support Vector Machine (SVM), and CNN-Long Short-Term Memory (LSTM). The multilabel classification strategy used is either binary relevance or classifier chain. The best combination of CNN parameters (number of filters, window sizes, activation functions, and dense units) for aspect categorization based on experimental results is 128, [2,3,4], ReLU, and 128. Meanwhile, for XGBoost hyperparameter combinations (the best learning rate, minimum height, minimum child weight, gamma, column sample by tree based on the experimental results are 0.2, 3, 1, 0, and 0.7. The test results in F1-measure for aspect categorization tasks, sentiment classification tasks, and combinations are respectively 0.9217, 0.9690, and 0.7274. The model with the proposed technique is able to exceed the performance of all baseline models in the aspect categorization. Meanwhile, for sentiment classification the performances of the proposed method for several aspect categories are still lower than the baselines (vanilla CNN, CNN-SVM, and CNN-LSTM). text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Aspect-level sentiment analysis is able to obtain more detailed information compared to sentiment analysis at the level of documents or sentences, i.e. information on aspect categories and sentiments contained in the review text. There are three tasks in sentiment analysis at the aspect level, i.e. aspect categorization, aspect extraction, and sentiment classification. This final project focuses on aspect categorization and sentiment classification for Indonesian . Aspect categorization is multilabel classification problem while the sentiment classification is binary class classification problem. The dataset used in this final project consists of 9450 hotel reviews as training data and 509 9450 hotel reviews as test data. There are 10 categories of aspects considered in this final project. Meanwhile, the polarity of the sentiments considered is positive and negative. To address aspect categorization and sentiment classification in this final project, the Convolutional Neural Network (CNN)-Extreme Gradient Boosting (XGBoost) technique was used as in the Ren, et al. (2017) about multiclass classification for image. The CNN topology that is used for the construction of the CNN-XGBoost model refers to the CNN topology in Chen et al. (2017) about multilabel text classification. The feature used is a lexical feature that is represented by word embedding. The selected baseline model is vanilla CNN, CNN-Support Vector Machine (SVM), and CNN-Long Short-Term Memory (LSTM). The multilabel classification strategy used is either binary relevance or classifier chain. The best combination of CNN parameters (number of filters, window sizes, activation functions, and dense units) for aspect categorization based on experimental results is 128, [2,3,4], ReLU, and 128. Meanwhile, for XGBoost hyperparameter combinations (the best learning rate, minimum height, minimum child weight, gamma, column sample by tree based on the experimental results are 0.2, 3, 1, 0, and 0.7. The test results in F1-measure for aspect categorization tasks, sentiment classification tasks, and combinations are respectively 0.9217, 0.9690, and 0.7274. The model with the proposed technique is able to exceed the performance of all baseline models in the aspect categorization. Meanwhile, for sentiment classification the performances of the proposed method for several aspect categories are still lower than the baselines (vanilla CNN, CNN-SVM, and CNN-LSTM).
format Final Project
author Nurul Azhar, Annisa
spellingShingle Nurul Azhar, Annisa
ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS
author_facet Nurul Azhar, Annisa
author_sort Nurul Azhar, Annisa
title ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS
title_short ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS
title_full ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS
title_fullStr ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS
title_full_unstemmed ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS
title_sort aspect categorization and sentiment classification for indonesian hotel reviews
url https://digilib.itb.ac.id/gdl/view/40005
_version_ 1822925595308195840