ASPECT AND OPINION TERMS EXTRACTION USING DOUBLE EMBEDDINGS AND ATTENTION MECHANISM FOR INDONESIAN TEXT REVIEWS
Aspect-based sentiment analysis (ABSA) from product or service reviews is one of the ways to measure customer satisfaction. The double embeddings and coupled multi-layer attentions approach yield better performance than the best research in SemEval 2016 task 5 for aspect and opinion terms extraction...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/39582 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Aspect-based sentiment analysis (ABSA) from product or service reviews is one of the ways to measure customer satisfaction. The double embeddings and coupled multi-layer attentions approach yield better performance than the best research in SemEval 2016 task 5 for aspect and opinion terms extraction. This thesis adapted both approaches to perform aspect and opinion terms extraction for Indonesian hotel reviews.
The double embeddings approach was adapted by trying various types of word embeddings used and by using Indonesian resources to train the word embeddings. The Indonesian resources used to train the word embeddings are the Indonesian Wikipedia corpus and Indonesian hotel reviews. The coupled multi-layer attentions approach was adapted by trying variations of RNN used in the model.
The experiments were conducted using 5000 hotel reviews that are divided divided into 3000 reviews for training data, 1000 reviews for validation data, and 1000 reviews for test data. Based on the experimental results, the best configuration of the model for word embeddings type, type of RNN, number of hidden units, number of coupled attentions layer, number of tensors, and dropout rates respectively are double embeddings, BiLSTM, 50, 2, 20, and 0.5. The F1-measure scores for the token level and entity level for the test data are 0.914 and 0.90, better than the baseline model used, namely Bidirectional Long Short-Term Memory with Conditional Random Field (BiLSTM-CRF), which gets 0.895 dan 0.885 F1-measure scores for token level and entity level respectively. |
---|