Sentiment analysis on movie reviews
With the rapid growth in the digital world, people are active on the internet with their smart devices anytime to share their opinions on any online platform. A large amount of unstructured information is generated every day to be mined and turned into meaningful digital outputs. Natural Langua...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/153196 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | With the rapid growth in the digital world, people are active on the internet with their smart
devices anytime to share their opinions on any online platform. A large amount of
unstructured information is generated every day to be mined and turned into meaningful
digital outputs. Natural Language Processing (NLP) aims to extract information from the raw
text and derive desired insights by performing different computational tasks. Sentiment
analysis is one of the NLP tasks which is also known as opinion mining. This task extracts
people’s opinions, feelings, and emotions by analysing the textual data. It has shown its
research value through various real-life applications such as collecting customer feedback,
performing product analysis, and monitoring the company brand and its reputation. Sentiment
analysis has been studied by many researchers for decades with many remarkable solutions
using rule-based and machine learning approaches.
This project aims to evaluate the performance of a supervised machine learning approach in
sentiment analysis. Binary sentiment classification and fine-grained classification are the two
subtasks of sentiment analysis. It is more challenging to perform a fine-grained classification
problem as it expands polarity into five levels: very positive, positive, neutral, negative, and
very negative. It requires models to make a precise prediction as there is a higher probability
of making a wrong prediction. Thus, this project focuses on binary sentiment classification to
predict the text into binary classes: positive and negative.
The approach consists of a group of traditional classification algorithms and neural networks.
The traditional classification algorithms such as Naïve Bayes, Support Vector Machine,
Logistic Regression, and more are implemented to predict the polarity of text into positive or
negative. In addition, various neural networks including Convolutional Neural Network,
Recurrent Neural Network, Recurrent Neural Network are implemented. Lastly, different
word vectorization or word embedding methods are used to evaluate their impacts on the
performance of models. By analysing and comparing the models, SVM and NB have
outperformed other models in traditional classifiers with TF-IDF as an optimal word
vectorization method. In neural network approach, the RNNs models have outperformed
other models. The training time is reduced significantly with GloVe word embedding while
having comparable performance as the models with embedding layers from Keras Library. |
---|