INDONESIAN SENTIMENT ANALYSIS USING DEEP NEURAL NETWORK AND DOCUMENT REPRESENTATION VECTOR
Sentiment analysis as one of the fields in natural language processing can be done in several extraction levels i.e. document level, sentence level, and aspect level. This research focuses on document level. Document level sentiment analysis is carried out to extract sentiments or opinions regarding...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/39113 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:39113 |
---|---|
spelling |
id-itb.:391132019-06-24T08:49:42ZINDONESIAN SENTIMENT ANALYSIS USING DEEP NEURAL NETWORK AND DOCUMENT REPRESENTATION VECTOR Ayu Putu Ari Crisdayanti, Ida Indonesia Final Project Sentiment Analysis, Document Level, DNN, Document Vector INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39113 Sentiment analysis as one of the fields in natural language processing can be done in several extraction levels i.e. document level, sentence level, and aspect level. This research focuses on document level. Document level sentiment analysis is carried out to extract sentiments or opinions regarding certain entities as a whole. Sentiment analysis problem can be solved using Deep Neural Network (DNN) approach. Some DNN topologies used in the experiment are Convolutional Neural Network (CNN), Gated Recurrent Neural Network (GRNN) i.e. Bi-LSTM and Bi-GRU, and Hierarchical Deep Neural Network (HDNN). In building the Indonesian sentiment analysis model, DNN requires document representation in the form of numerical vectors. Therefore, the experiment also includes effectiveness examination of the use of document representation vectors produced by paragraph vector and deep document embedding techniques. Both document representation techniques aim to extract the entire information context in document to maximize the perfomance of sentiment analysis model. The sentiment analysis experiment was conducted using two datasets i.e. TripAdvisor dataset from the research baseline and the datasets from Prosa.ai which is a collection of texts collected from Twitter, Zomato, Facebook, Instagram, and Qraved. From the experimental results, it is shown that the best DNN model for TripAdvisor dataset is CNN with f1-score of 0.8341. This model outperforms the best model in the research baseline. For Prosa dataset, the performance of all DNN models also outperforms baseline with each f1-score above 90%. In Prosa dataset, the use of document representation technique (paragraph vector) increases the f1-score of sentiment analysis model by 1.4648-2.4401%. Meanwhile, the use of paragraph vector for TripAdvisor dataset does not improve the model performance because many text documents are incomplete in training and test data. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Sentiment analysis as one of the fields in natural language processing can be done in several extraction levels i.e. document level, sentence level, and aspect level. This research focuses on document level. Document level sentiment analysis is carried out to extract sentiments or opinions regarding certain entities as a whole. Sentiment analysis problem can be solved using Deep Neural Network (DNN) approach.
Some DNN topologies used in the experiment are Convolutional Neural Network (CNN), Gated Recurrent Neural Network (GRNN) i.e. Bi-LSTM and Bi-GRU, and Hierarchical Deep Neural Network (HDNN). In building the Indonesian sentiment analysis model, DNN requires document representation in the form of numerical vectors. Therefore, the experiment also includes effectiveness examination of the use of document representation vectors produced by paragraph vector and deep document embedding techniques. Both document representation techniques aim to extract the entire information context in document to maximize the perfomance of sentiment analysis model.
The sentiment analysis experiment was conducted using two datasets i.e. TripAdvisor dataset from the research baseline and the datasets from Prosa.ai which is a collection of texts collected from Twitter, Zomato, Facebook, Instagram, and Qraved. From the experimental results, it is shown that the best DNN model for TripAdvisor dataset is CNN with f1-score of 0.8341. This model outperforms the best model in the research baseline. For Prosa dataset, the performance of all DNN models also outperforms baseline with each f1-score above 90%. In Prosa dataset, the use of document representation technique (paragraph vector) increases the f1-score of sentiment analysis model by 1.4648-2.4401%. Meanwhile, the use of paragraph vector for TripAdvisor dataset does not improve the model performance because many text documents are incomplete in training and test data. |
format |
Final Project |
author |
Ayu Putu Ari Crisdayanti, Ida |
spellingShingle |
Ayu Putu Ari Crisdayanti, Ida INDONESIAN SENTIMENT ANALYSIS USING DEEP NEURAL NETWORK AND DOCUMENT REPRESENTATION VECTOR |
author_facet |
Ayu Putu Ari Crisdayanti, Ida |
author_sort |
Ayu Putu Ari Crisdayanti, Ida |
title |
INDONESIAN SENTIMENT ANALYSIS USING DEEP NEURAL NETWORK AND DOCUMENT REPRESENTATION VECTOR |
title_short |
INDONESIAN SENTIMENT ANALYSIS USING DEEP NEURAL NETWORK AND DOCUMENT REPRESENTATION VECTOR |
title_full |
INDONESIAN SENTIMENT ANALYSIS USING DEEP NEURAL NETWORK AND DOCUMENT REPRESENTATION VECTOR |
title_fullStr |
INDONESIAN SENTIMENT ANALYSIS USING DEEP NEURAL NETWORK AND DOCUMENT REPRESENTATION VECTOR |
title_full_unstemmed |
INDONESIAN SENTIMENT ANALYSIS USING DEEP NEURAL NETWORK AND DOCUMENT REPRESENTATION VECTOR |
title_sort |
indonesian sentiment analysis using deep neural network and document representation vector |
url |
https://digilib.itb.ac.id/gdl/view/39113 |
_version_ |
1822925201447321600 |