DEVELOPMENT OF DOMAIN-SPECIFIC LEXICON FOR ASPECT-BASED SENTIMENT ANALYSIS
Aspect term extraction is an important step in aspect-based sentiment analysis. The Sequential Covering method by Ruskanda et al. (2019), managed to improve the performance of aspect extraction by using an aspect and opinion list. However, the word lists were developed manually, costing a signifi...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Subjects: | |
Online Access: | https://digilib.itb.ac.id/gdl/view/47955 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:47955 |
---|---|
spelling |
id-itb.:479552020-06-24T21:40:19ZDEVELOPMENT OF DOMAIN-SPECIFIC LEXICON FOR ASPECT-BASED SENTIMENT ANALYSIS Michelle, Prisila Teknik (Rekayasa, enjinering dan kegiatan berkaitan) Indonesia Final Project domain-specific lexicon, aspect extraction, word embedding. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/47955 Aspect term extraction is an important step in aspect-based sentiment analysis. The Sequential Covering method by Ruskanda et al. (2019), managed to improve the performance of aspect extraction by using an aspect and opinion list. However, the word lists were developed manually, costing a significant amount of time and effort. To ease the effort, in this final project we developed a system to automatically build aspect and opinion lists using word embedding. The resulting word list is called domain-specific lexicon because the scope of the word list is expanded from a dataset to a domain. We began the development of domain-specific lexicons by collecting data using a focused crawler. Then, we preprocessed and used the data to build word embedding. After that, we extracted the words that are related to the domain with supervised and unsupervised approach. The resulting domain-specific lexicons were used in the modified Sequential Covering method. Other aspect extraction methods such as Aspectator, Double Propagation, Sequential Covering without lexicon, and Sequential Covering with aspect and opinion list were used as the baselines. The best accuracies from the experiments for aspect and opinion separation were obtained by using an SVM classifier with vectors of size 300 created by CBOW model as the feature. The best F1 scores from the experiments for aspect extraction with the modified Sequential Covering method in the Nikon Coolpix 4300 (0.645), Canon G3 (0.581), Nokia 6610 (0.629) and ABSA16_Restaurants_Train_SB1 (0.705) dataset were higher than the results from every baseline other than Sequential Covering with aspect and opinion list baseline, because of the following reasons: The aspect/opinion in test dataset are not in word embedding, there is no similar aspect/opinion found in the domain-specific lexicon, or errors found in the labelled data used for building domain-specific lexicon. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
topic |
Teknik (Rekayasa, enjinering dan kegiatan berkaitan) |
spellingShingle |
Teknik (Rekayasa, enjinering dan kegiatan berkaitan) Michelle, Prisila DEVELOPMENT OF DOMAIN-SPECIFIC LEXICON FOR ASPECT-BASED SENTIMENT ANALYSIS |
description |
Aspect term extraction is an important step in aspect-based sentiment analysis. The
Sequential Covering method by Ruskanda et al. (2019), managed to improve the
performance of aspect extraction by using an aspect and opinion list. However, the
word lists were developed manually, costing a significant amount of time and effort.
To ease the effort, in this final project we developed a system to automatically build
aspect and opinion lists using word embedding. The resulting word list is called
domain-specific lexicon because the scope of the word list is expanded from a
dataset to a domain.
We began the development of domain-specific lexicons by collecting data using a
focused crawler. Then, we preprocessed and used the data to build word embedding.
After that, we extracted the words that are related to the domain with supervised
and unsupervised approach. The resulting domain-specific lexicons were used in
the modified Sequential Covering method. Other aspect extraction methods such as
Aspectator, Double Propagation, Sequential Covering without lexicon, and
Sequential Covering with aspect and opinion list were used as the baselines.
The best accuracies from the experiments for aspect and opinion separation were
obtained by using an SVM classifier with vectors of size 300 created by CBOW
model as the feature. The best F1 scores from the experiments for aspect extraction
with the modified Sequential Covering method in the Nikon Coolpix 4300 (0.645),
Canon G3 (0.581), Nokia 6610 (0.629) and ABSA16_Restaurants_Train_SB1
(0.705) dataset were higher than the results from every baseline other than
Sequential Covering with aspect and opinion list baseline, because of the following
reasons: The aspect/opinion in test dataset are not in word embedding, there is no
similar aspect/opinion found in the domain-specific lexicon, or errors found in the
labelled data used for building domain-specific lexicon. |
format |
Final Project |
author |
Michelle, Prisila |
author_facet |
Michelle, Prisila |
author_sort |
Michelle, Prisila |
title |
DEVELOPMENT OF DOMAIN-SPECIFIC LEXICON FOR ASPECT-BASED SENTIMENT ANALYSIS |
title_short |
DEVELOPMENT OF DOMAIN-SPECIFIC LEXICON FOR ASPECT-BASED SENTIMENT ANALYSIS |
title_full |
DEVELOPMENT OF DOMAIN-SPECIFIC LEXICON FOR ASPECT-BASED SENTIMENT ANALYSIS |
title_fullStr |
DEVELOPMENT OF DOMAIN-SPECIFIC LEXICON FOR ASPECT-BASED SENTIMENT ANALYSIS |
title_full_unstemmed |
DEVELOPMENT OF DOMAIN-SPECIFIC LEXICON FOR ASPECT-BASED SENTIMENT ANALYSIS |
title_sort |
development of domain-specific lexicon for aspect-based sentiment analysis |
url |
https://digilib.itb.ac.id/gdl/view/47955 |
_version_ |
1822927784072183808 |