SENTIMENT-SPECIFIC WORD EMBEDDING EFFECT ON INDONESIAN SENTIMENT ANALYSIS
Sentiment analysis aims to determine the sentiment polarity of opinions on text data. The feature representation of the text data has significant influence on sentiment analysis system performance. In addition to the lexical representation of bag of words and TF-IDF, word embedding as a representati...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/20853 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Sentiment analysis aims to determine the sentiment polarity of opinions on text data. The feature representation of the text data has significant influence on sentiment analysis system performance. In addition to the lexical representation of bag of words and TF-IDF, word embedding as a representation of the word semantic features has been widely used in sentimental analysis research. However, the general word embedding only models semantically and does not take into account the sentiments of the word. <br />
<br />
<br />
<br />
Sentiment-specific word embedding (SSWE) is a representation that not only models semantically, but also takes into account the word sentiments. SSWE produces an n-dimensional feature vector model for each word in the training corpus. This model is obtained by training embedding through artificial neural networks and backpropagation training algorithms. Until now, Indonesian sentiment analysis research using SSWE has not been found. In this final project, an observation of the influence of SSWE on the classification of sentiment in Indonesian language is done. <br />
<br />
<br />
<br />
The corpus and dataset used were collected from TripAdvisor reviews. A total of 306,448 reviews used as SSWE corpus and 12,389 reviews for the train data. The results of the final project experiment stated that the use of SSWE improves sentiment classification performance rather than using Word2Vec. Using an artificial neural network classification model, F1-score generated by SSWE reached 0.7602 for the test set, and 0.7687 for 10-fold cross-validation. However, the F1-score of SSWE or Word2Vec was still below the F1-score generated by the TF-IDF feature baseline experiments that reached 0.8521 for the test set and 0.8492 for 10-fold cross-validation. |
---|