SENTIMENT-SPECIFIC WORD EMBEDDING EFFECT ON INDONESIAN SENTIMENT ANALYSIS

Sentiment analysis aims to determine the sentiment polarity of opinions on text data. The feature representation of the text data has significant influence on sentiment analysis system performance. In addition to the lexical representation of bag of words and TF-IDF, word embedding as a representati...

Full description

Saved in:
Bibliographic Details
Main Author: NAUFAL FARHAN (NIM : 13513049), AHMAD
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/20853
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Sentiment analysis aims to determine the sentiment polarity of opinions on text data. The feature representation of the text data has significant influence on sentiment analysis system performance. In addition to the lexical representation of bag of words and TF-IDF, word embedding as a representation of the word semantic features has been widely used in sentimental analysis research. However, the general word embedding only models semantically and does not take into account the sentiments of the word. <br /> <br /> <br /> <br /> Sentiment-specific word embedding (SSWE) is a representation that not only models semantically, but also takes into account the word sentiments. SSWE produces an n-dimensional feature vector model for each word in the training corpus. This model is obtained by training embedding through artificial neural networks and backpropagation training algorithms. Until now, Indonesian sentiment analysis research using SSWE has not been found. In this final project, an observation of the influence of SSWE on the classification of sentiment in Indonesian language is done. <br /> <br /> <br /> <br /> The corpus and dataset used were collected from TripAdvisor reviews. A total of 306,448 reviews used as SSWE corpus and 12,389 reviews for the train data. The results of the final project experiment stated that the use of SSWE improves sentiment classification performance rather than using Word2Vec. Using an artificial neural network classification model, F1-score generated by SSWE reached 0.7602 for the test set, and 0.7687 for 10-fold cross-validation. However, the F1-score of SSWE or Word2Vec was still below the F1-score generated by the TF-IDF feature baseline experiments that reached 0.8521 for the test set and 0.8492 for 10-fold cross-validation.