ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS

Aspect-level sentiment analysis is able to obtain more detailed information compared to sentiment analysis at the level of documents or sentences, i.e. information on aspect categories and sentiments contained in the review text. There are three tasks in sentiment analysis at the aspect level, i.e....

Full description

Saved in:

Bibliographic Details
Main Author:	Nurul Azhar, Annisa
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/40005
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:40005
spelling	id-itb.:400052019-06-28T15:23:39ZASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS Nurul Azhar, Annisa Indonesia Final Project aspect categorization, sentiment classification, multilabel classification, single label classification, convolutional neural network, extreme gradient boosting INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/40005 Aspect-level sentiment analysis is able to obtain more detailed information compared to sentiment analysis at the level of documents or sentences, i.e. information on aspect categories and sentiments contained in the review text. There are three tasks in sentiment analysis at the aspect level, i.e. aspect categorization, aspect extraction, and sentiment classification. This final project focuses on aspect categorization and sentiment classification for Indonesian . Aspect categorization is multilabel classification problem while the sentiment classification is binary class classification problem. The dataset used in this final project consists of 9450 hotel reviews as training data and 509 9450 hotel reviews as test data. There are 10 categories of aspects considered in this final project. Meanwhile, the polarity of the sentiments considered is positive and negative. To address aspect categorization and sentiment classification in this final project, the Convolutional Neural Network (CNN)-Extreme Gradient Boosting (XGBoost) technique was used as in the Ren, et al. (2017) about multiclass classification for image. The CNN topology that is used for the construction of the CNN-XGBoost model refers to the CNN topology in Chen et al. (2017) about multilabel text classification. The feature used is a lexical feature that is represented by word embedding. The selected baseline model is vanilla CNN, CNN-Support Vector Machine (SVM), and CNN-Long Short-Term Memory (LSTM). The multilabel classification strategy used is either binary relevance or classifier chain. The best combination of CNN parameters (number of filters, window sizes, activation functions, and dense units) for aspect categorization based on experimental results is 128, [2,3,4], ReLU, and 128. Meanwhile, for XGBoost hyperparameter combinations (the best learning rate, minimum height, minimum child weight, gamma, column sample by tree based on the experimental results are 0.2, 3, 1, 0, and 0.7. The test results in F1-measure for aspect categorization tasks, sentiment classification tasks, and combinations are respectively 0.9217, 0.9690, and 0.7274. The model with the proposed technique is able to exceed the performance of all baseline models in the aspect categorization. Meanwhile, for sentiment classification the performances of the proposed method for several aspect categories are still lower than the baselines (vanilla CNN, CNN-SVM, and CNN-LSTM). text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Aspect-level sentiment analysis is able to obtain more detailed information compared to sentiment analysis at the level of documents or sentences, i.e. information on aspect categories and sentiments contained in the review text. There are three tasks in sentiment analysis at the aspect level, i.e. aspect categorization, aspect extraction, and sentiment classification. This final project focuses on aspect categorization and sentiment classification for Indonesian . Aspect categorization is multilabel classification problem while the sentiment classification is binary class classification problem. The dataset used in this final project consists of 9450 hotel reviews as training data and 509 9450 hotel reviews as test data. There are 10 categories of aspects considered in this final project. Meanwhile, the polarity of the sentiments considered is positive and negative. To address aspect categorization and sentiment classification in this final project, the Convolutional Neural Network (CNN)-Extreme Gradient Boosting (XGBoost) technique was used as in the Ren, et al. (2017) about multiclass classification for image. The CNN topology that is used for the construction of the CNN-XGBoost model refers to the CNN topology in Chen et al. (2017) about multilabel text classification. The feature used is a lexical feature that is represented by word embedding. The selected baseline model is vanilla CNN, CNN-Support Vector Machine (SVM), and CNN-Long Short-Term Memory (LSTM). The multilabel classification strategy used is either binary relevance or classifier chain. The best combination of CNN parameters (number of filters, window sizes, activation functions, and dense units) for aspect categorization based on experimental results is 128, [2,3,4], ReLU, and 128. Meanwhile, for XGBoost hyperparameter combinations (the best learning rate, minimum height, minimum child weight, gamma, column sample by tree based on the experimental results are 0.2, 3, 1, 0, and 0.7. The test results in F1-measure for aspect categorization tasks, sentiment classification tasks, and combinations are respectively 0.9217, 0.9690, and 0.7274. The model with the proposed technique is able to exceed the performance of all baseline models in the aspect categorization. Meanwhile, for sentiment classification the performances of the proposed method for several aspect categories are still lower than the baselines (vanilla CNN, CNN-SVM, and CNN-LSTM).
format	Final Project
author	Nurul Azhar, Annisa
spellingShingle	Nurul Azhar, Annisa ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS
author_facet	Nurul Azhar, Annisa
author_sort	Nurul Azhar, Annisa
title	ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS
title_short	ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS
title_full	ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS
title_fullStr	ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS
title_full_unstemmed	ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS
title_sort	aspect categorization and sentiment classification for indonesian hotel reviews
url	https://digilib.itb.ac.id/gdl/view/40005
_version_	1822925595308195840

ASPECT CATEGORIZATION AND SENTIMENT CLASSIFICATION FOR INDONESIAN HOTEL REVIEWS

Similar Items