SENTIMENT-SPECIFIC WORD EMBEDDING EFFECT ON INDONESIAN SENTIMENT ANALYSIS

Sentiment analysis aims to determine the sentiment polarity of opinions on text data. The feature representation of the text data has significant influence on sentiment analysis system performance. In addition to the lexical representation of bag of words and TF-IDF, word embedding as a representati...

Full description

Saved in:
Bibliographic Details
Main Author: NAUFAL FARHAN (NIM : 13513049), AHMAD
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/20853
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:20853
spelling id-itb.:208532017-10-09T10:28:08ZSENTIMENT-SPECIFIC WORD EMBEDDING EFFECT ON INDONESIAN SENTIMENT ANALYSIS NAUFAL FARHAN (NIM : 13513049), AHMAD Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/20853 Sentiment analysis aims to determine the sentiment polarity of opinions on text data. The feature representation of the text data has significant influence on sentiment analysis system performance. In addition to the lexical representation of bag of words and TF-IDF, word embedding as a representation of the word semantic features has been widely used in sentimental analysis research. However, the general word embedding only models semantically and does not take into account the sentiments of the word. <br /> <br /> <br /> <br /> Sentiment-specific word embedding (SSWE) is a representation that not only models semantically, but also takes into account the word sentiments. SSWE produces an n-dimensional feature vector model for each word in the training corpus. This model is obtained by training embedding through artificial neural networks and backpropagation training algorithms. Until now, Indonesian sentiment analysis research using SSWE has not been found. In this final project, an observation of the influence of SSWE on the classification of sentiment in Indonesian language is done. <br /> <br /> <br /> <br /> The corpus and dataset used were collected from TripAdvisor reviews. A total of 306,448 reviews used as SSWE corpus and 12,389 reviews for the train data. The results of the final project experiment stated that the use of SSWE improves sentiment classification performance rather than using Word2Vec. Using an artificial neural network classification model, F1-score generated by SSWE reached 0.7602 for the test set, and 0.7687 for 10-fold cross-validation. However, the F1-score of SSWE or Word2Vec was still below the F1-score generated by the TF-IDF feature baseline experiments that reached 0.8521 for the test set and 0.8492 for 10-fold cross-validation. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Sentiment analysis aims to determine the sentiment polarity of opinions on text data. The feature representation of the text data has significant influence on sentiment analysis system performance. In addition to the lexical representation of bag of words and TF-IDF, word embedding as a representation of the word semantic features has been widely used in sentimental analysis research. However, the general word embedding only models semantically and does not take into account the sentiments of the word. <br /> <br /> <br /> <br /> Sentiment-specific word embedding (SSWE) is a representation that not only models semantically, but also takes into account the word sentiments. SSWE produces an n-dimensional feature vector model for each word in the training corpus. This model is obtained by training embedding through artificial neural networks and backpropagation training algorithms. Until now, Indonesian sentiment analysis research using SSWE has not been found. In this final project, an observation of the influence of SSWE on the classification of sentiment in Indonesian language is done. <br /> <br /> <br /> <br /> The corpus and dataset used were collected from TripAdvisor reviews. A total of 306,448 reviews used as SSWE corpus and 12,389 reviews for the train data. The results of the final project experiment stated that the use of SSWE improves sentiment classification performance rather than using Word2Vec. Using an artificial neural network classification model, F1-score generated by SSWE reached 0.7602 for the test set, and 0.7687 for 10-fold cross-validation. However, the F1-score of SSWE or Word2Vec was still below the F1-score generated by the TF-IDF feature baseline experiments that reached 0.8521 for the test set and 0.8492 for 10-fold cross-validation.
format Final Project
author NAUFAL FARHAN (NIM : 13513049), AHMAD
spellingShingle NAUFAL FARHAN (NIM : 13513049), AHMAD
SENTIMENT-SPECIFIC WORD EMBEDDING EFFECT ON INDONESIAN SENTIMENT ANALYSIS
author_facet NAUFAL FARHAN (NIM : 13513049), AHMAD
author_sort NAUFAL FARHAN (NIM : 13513049), AHMAD
title SENTIMENT-SPECIFIC WORD EMBEDDING EFFECT ON INDONESIAN SENTIMENT ANALYSIS
title_short SENTIMENT-SPECIFIC WORD EMBEDDING EFFECT ON INDONESIAN SENTIMENT ANALYSIS
title_full SENTIMENT-SPECIFIC WORD EMBEDDING EFFECT ON INDONESIAN SENTIMENT ANALYSIS
title_fullStr SENTIMENT-SPECIFIC WORD EMBEDDING EFFECT ON INDONESIAN SENTIMENT ANALYSIS
title_full_unstemmed SENTIMENT-SPECIFIC WORD EMBEDDING EFFECT ON INDONESIAN SENTIMENT ANALYSIS
title_sort sentiment-specific word embedding effect on indonesian sentiment analysis
url https://digilib.itb.ac.id/gdl/view/20853
_version_ 1821120285734076416