TRANSFER LEARNING USING POST-TRAINING FOR INDONESIAN ASPECT-BASED SENTIMEN ANALYSIS
ABSTRACT TRANSFER LEARNING USING POST-TRAINING FOR INDONESIAN ASPECT-BASED SENTIMEN ANALYSIS By I Putu Eka Surya Aditya NIM: 23530053 (Master’s Program in Informatics) Aspect-based sentimen analysis has an important role in business development because it makes it easier for business people...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/63359 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | ABSTRACT
TRANSFER LEARNING USING POST-TRAINING FOR
INDONESIAN ASPECT-BASED SENTIMEN ANALYSIS
By
I Putu Eka Surya Aditya
NIM: 23530053
(Master’s Program in Informatics)
Aspect-based sentimen analysis has an important role in business development
because it makes it easier for business people to evaluate feedback from customers
for every aspect of the service. In recent years, pre-trained language models such
as ELMo, BERT, XLM-R and XLNet have achieved great success in natural
language processing (NLP) tasks especially aspect-based sentimen analysis. For
Indonesian, there have been several studies on aspect-based sentimen analysis
tasks. The latest research by (Azhar and Khodra, 2020) used mBERT as a pretrained
language model and succeeded in achieving the best kinerjance for hoteldomain
review data. The approach used by (Azhar and Khodra, 2020) is the use of
auxiliary sentences adapted from research by (Sun et al., 2019). There is another
approach that also achieves good kinerjance on aspect-based sentimen analysis
tasks, namely post-training by (Xu et al., 2019). In the research (Xu et al., 2019)
conducted post-training for the aspect-based sentimen analysis task and joint posttraining
on the Review Reading Comprehension (RRC) task. In this study, a test
was conducted to see the effect of using post-training and joint post-training on an
aspect-based sentimen classification task using a different pre-trained language
model from previous research.
In this study, three pre-trained language models were used, namely: BERT (mBERT
and IndoBERT), XLM-R, and XLNet (XLNet English and XLNet Malay). For the
problem-solving approach, two approaches are used, namely the use of auxiliary
sentences (Sun et al., 2019) and post-training/joint post-training (Xu et al., 2019).
The data used in this study is divided into three types of data, namely data for posttraining,
data for joint post-training and data for training and testing. The data for
post-training are hotel reviews without labels (unsupervised), data for joint posttraining
are car review data, and data for training and testing are the same as the
data used in the study (Azhar and Khodra, 2020).
The test results show that IndoBERT has a better performance than the baseline
model (mBERT) either with or without a post-training approach. The post-training
approach on XLM-R achieved the best performance with an F1-score of 0.9875 on
Test 1 data and 0.9614 on Test II data. The model outperformed the baseline
(mBERT without post-training) by 1.04% in Test data 1 and 2.92% in test data 2.
iv
This is because XLM-R is trained with much more parameters and a much larger
dictionary size than mBERT. The test results also show the performance of the posttraining
model outperformed the post-training joint model on all pre-trained
language models. The model in this thesis achieves the best performance on the
Indonesian language hotel review data (HoASA).
Keywords: aspect-based sentimen analysis, NLP, pre-trained language model,
IndoBERT, XLM-R, XLNet, auxiliary sentences, post-training, joint post-training. |
---|