TRANSFER LEARNING AND SPAN-BASED REPRESENTATION FOR OPINION TRIPLET EXTRACTION FOR ASPECT-BASED SENTIMEN ANALYSIS

Aspect-based sentiment analysis can help in getting an overview of public opinion on a particular product or topic. One scope of aspect-based sentiment analysis is to extract opinion triplets, which is to get a triplet list of aspect expressions, sentiment expressions, and sentiment polarity cont...

Full description

Saved in:
Bibliographic Details
Main Author: Ahmad Genadi, Rifo
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/61880
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Aspect-based sentiment analysis can help in getting an overview of public opinion on a particular product or topic. One scope of aspect-based sentiment analysis is to extract opinion triplets, which is to get a triplet list of aspect expressions, sentiment expressions, and sentiment polarity contained in the review sentence. One method for extracting triplet opinions is by classifying the span representation. The advantage of this approach is that it handles several subtasks at once, which helps to deal with inconsistencies in model predictions. Then, tokenization as well as utilizing transfer learning from language models like BERT can help deal with OOV cases. This study focuses on extracting triplet opinions with span-based representations, as well as utilizing transfer learning which is the current state-ofthe- art of NLP in carrying out this task. Opinion triplet extraction with span-based representation can be done by modifying the SpanMLT framework, so that the relation scorer does not only perform binary classification of the presence or absence of a relation in a span pair, but also performs multiclass classification whether it has a positive, negative, or unrelated relationship. Then, adjustments were made to the selection of the top k candidate spans to be paired and adjustments to the FFNN section of the relation scorer. This study uses hotel review data in Indonesian as a case study. Model languages such as IndoBERT can be used as the base encoder of the framework. Based on the experimental results, the best model configuration for the case of hotel reviews is post-training on the language model used, setting the maximum span length to four, the percentage of k candidate spans selected is 0.4, and the weighting ratio between the term scorer and the relation scorer is one. Based on the test, the span representation model has not been able to exceed the baseline model, namely the DOER model in the Genadi Final Project and the IndoBERT fine-tuning on sequence labelling task, it also has a low recall value. The span-based model that was built got an F1-score of 0.75 for the aspect expression and sentiment expression extraction task and 0.56 for the opinion triplet extraction task on the test data.