DEVELOPMENT OF DOMAIN-SPECIFIC LEXICON FOR ASPECT-BASED SENTIMENT ANALYSIS

Aspect term extraction is an important step in aspect-based sentiment analysis. The Sequential Covering method by Ruskanda et al. (2019), managed to improve the performance of aspect extraction by using an aspect and opinion list. However, the word lists were developed manually, costing a signifi...

Full description

Saved in:
Bibliographic Details
Main Author: Michelle, Prisila
Format: Final Project
Language:Indonesia
Subjects:
Online Access:https://digilib.itb.ac.id/gdl/view/47955
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Aspect term extraction is an important step in aspect-based sentiment analysis. The Sequential Covering method by Ruskanda et al. (2019), managed to improve the performance of aspect extraction by using an aspect and opinion list. However, the word lists were developed manually, costing a significant amount of time and effort. To ease the effort, in this final project we developed a system to automatically build aspect and opinion lists using word embedding. The resulting word list is called domain-specific lexicon because the scope of the word list is expanded from a dataset to a domain. We began the development of domain-specific lexicons by collecting data using a focused crawler. Then, we preprocessed and used the data to build word embedding. After that, we extracted the words that are related to the domain with supervised and unsupervised approach. The resulting domain-specific lexicons were used in the modified Sequential Covering method. Other aspect extraction methods such as Aspectator, Double Propagation, Sequential Covering without lexicon, and Sequential Covering with aspect and opinion list were used as the baselines. The best accuracies from the experiments for aspect and opinion separation were obtained by using an SVM classifier with vectors of size 300 created by CBOW model as the feature. The best F1 scores from the experiments for aspect extraction with the modified Sequential Covering method in the Nikon Coolpix 4300 (0.645), Canon G3 (0.581), Nokia 6610 (0.629) and ABSA16_Restaurants_Train_SB1 (0.705) dataset were higher than the results from every baseline other than Sequential Covering with aspect and opinion list baseline, because of the following reasons: The aspect/opinion in test dataset are not in word embedding, there is no similar aspect/opinion found in the domain-specific lexicon, or errors found in the labelled data used for building domain-specific lexicon.