DEVELOPMENT OF DOMAIN-SPECIFIC LEXICON FOR ASPECT-BASED SENTIMENT ANALYSIS
Aspect term extraction is an important step in aspect-based sentiment analysis. The Sequential Covering method by Ruskanda et al. (2019), managed to improve the performance of aspect extraction by using an aspect and opinion list. However, the word lists were developed manually, costing a signifi...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Subjects: | |
Online Access: | https://digilib.itb.ac.id/gdl/view/47955 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Aspect term extraction is an important step in aspect-based sentiment analysis. The
Sequential Covering method by Ruskanda et al. (2019), managed to improve the
performance of aspect extraction by using an aspect and opinion list. However, the
word lists were developed manually, costing a significant amount of time and effort.
To ease the effort, in this final project we developed a system to automatically build
aspect and opinion lists using word embedding. The resulting word list is called
domain-specific lexicon because the scope of the word list is expanded from a
dataset to a domain.
We began the development of domain-specific lexicons by collecting data using a
focused crawler. Then, we preprocessed and used the data to build word embedding.
After that, we extracted the words that are related to the domain with supervised
and unsupervised approach. The resulting domain-specific lexicons were used in
the modified Sequential Covering method. Other aspect extraction methods such as
Aspectator, Double Propagation, Sequential Covering without lexicon, and
Sequential Covering with aspect and opinion list were used as the baselines.
The best accuracies from the experiments for aspect and opinion separation were
obtained by using an SVM classifier with vectors of size 300 created by CBOW
model as the feature. The best F1 scores from the experiments for aspect extraction
with the modified Sequential Covering method in the Nikon Coolpix 4300 (0.645),
Canon G3 (0.581), Nokia 6610 (0.629) and ABSA16_Restaurants_Train_SB1
(0.705) dataset were higher than the results from every baseline other than
Sequential Covering with aspect and opinion list baseline, because of the following
reasons: The aspect/opinion in test dataset are not in word embedding, there is no
similar aspect/opinion found in the domain-specific lexicon, or errors found in the
labelled data used for building domain-specific lexicon. |
---|