Sentiment analysis on movie reviews

With the rapid growth in the digital world, people are active on the internet with their smart devices anytime to share their opinions on any online platform. A large amount of unstructured information is generated every day to be mined and turned into meaningful digital outputs. Natural Langua...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Wen
Other Authors: Sun Aixin
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/153196
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:With the rapid growth in the digital world, people are active on the internet with their smart devices anytime to share their opinions on any online platform. A large amount of unstructured information is generated every day to be mined and turned into meaningful digital outputs. Natural Language Processing (NLP) aims to extract information from the raw text and derive desired insights by performing different computational tasks. Sentiment analysis is one of the NLP tasks which is also known as opinion mining. This task extracts people’s opinions, feelings, and emotions by analysing the textual data. It has shown its research value through various real-life applications such as collecting customer feedback, performing product analysis, and monitoring the company brand and its reputation. Sentiment analysis has been studied by many researchers for decades with many remarkable solutions using rule-based and machine learning approaches. This project aims to evaluate the performance of a supervised machine learning approach in sentiment analysis. Binary sentiment classification and fine-grained classification are the two subtasks of sentiment analysis. It is more challenging to perform a fine-grained classification problem as it expands polarity into five levels: very positive, positive, neutral, negative, and very negative. It requires models to make a precise prediction as there is a higher probability of making a wrong prediction. Thus, this project focuses on binary sentiment classification to predict the text into binary classes: positive and negative. The approach consists of a group of traditional classification algorithms and neural networks. The traditional classification algorithms such as Naïve Bayes, Support Vector Machine, Logistic Regression, and more are implemented to predict the polarity of text into positive or negative. In addition, various neural networks including Convolutional Neural Network, Recurrent Neural Network, Recurrent Neural Network are implemented. Lastly, different word vectorization or word embedding methods are used to evaluate their impacts on the performance of models. By analysing and comparing the models, SVM and NB have outperformed other models in traditional classifiers with TF-IDF as an optimal word vectorization method. In neural network approach, the RNNs models have outperformed other models. The training time is reduced significantly with GloVe word embedding while having comparable performance as the models with embedding layers from Keras Library.