Lexical knowledge-based machine learning method for sentiment analysis

Before doing any sentiment analysis or classifications, one would need labelled reviews (either a positive or negative sentiment) to do further data mining or natural language processing. Labelling of reviews are done manually and are usually time-consuming and demanding. In this paper, we proposed...

Full description

Saved in:
Bibliographic Details
Main Author: Heng, Lai Xiang
Other Authors: Cong Gao
Format: Final Year Project
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/62824
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-62824
record_format dspace
spelling sg-ntu-dr.10356-628242023-03-03T20:34:55Z Lexical knowledge-based machine learning method for sentiment analysis Heng, Lai Xiang Cong Gao School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Information systems Before doing any sentiment analysis or classifications, one would need labelled reviews (either a positive or negative sentiment) to do further data mining or natural language processing. Labelling of reviews are done manually and are usually time-consuming and demanding. In this paper, we proposed a new learning algorithm, which is to combine supervised learning with the pre-compiled opinion lexicons. Using this algorithm, manpower and time needed are greatly reduced as it will not require manually labelling of reviews. For this project, customers’ reviews on restaurants will be used from the rich pool of Yelp dataset. There are a total of five steps to the new algorithm: 1) Building two pseudo positive and negative documents. 2) Computation on the pairwise document similarity between the review documents and the positive and negative documents using either the Cosine Similarity or Euclidean Distance approach. 3) Labelling the reviews to either a positive or negative sentiment based on the similarity results. 4) Rank the reviews. 5) Selecting top 2,000 reviews, each 1,000 from the positive and negative labelled documents for sentiment classification model building. In this experiment, we looked into both Naïve Bayes and Support Vector Machine (SVM) classifiers. Three different feature extraction methods namely bag of words model, bag of words model with stopwords removed and using of significant bigrams are used for training the classifier. Out of the three, the use of significant bigrams performed the best by achieving 67% in accuracy whereas the bag of words model performed the worst for Naïve Bayes classifier. On the other hand, SVM classifier performs well in both bag of words model and bag of words model with stopwords removed, achieving an accuracy of about 99%. However, this may indicate an overfitting due to the large sparse of features. Nevertheless, this experiment shows that the automation system of labelling the reviews is possible and it is one step closer in achieving to the goal. Bachelor of Engineering (Computer Science) 2015-04-29T07:58:45Z 2015-04-29T07:58:45Z 2015 2015 Final Year Project (FYP) http://hdl.handle.net/10356/62824 en Nanyang Technological University 44 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems
Heng, Lai Xiang
Lexical knowledge-based machine learning method for sentiment analysis
description Before doing any sentiment analysis or classifications, one would need labelled reviews (either a positive or negative sentiment) to do further data mining or natural language processing. Labelling of reviews are done manually and are usually time-consuming and demanding. In this paper, we proposed a new learning algorithm, which is to combine supervised learning with the pre-compiled opinion lexicons. Using this algorithm, manpower and time needed are greatly reduced as it will not require manually labelling of reviews. For this project, customers’ reviews on restaurants will be used from the rich pool of Yelp dataset. There are a total of five steps to the new algorithm: 1) Building two pseudo positive and negative documents. 2) Computation on the pairwise document similarity between the review documents and the positive and negative documents using either the Cosine Similarity or Euclidean Distance approach. 3) Labelling the reviews to either a positive or negative sentiment based on the similarity results. 4) Rank the reviews. 5) Selecting top 2,000 reviews, each 1,000 from the positive and negative labelled documents for sentiment classification model building. In this experiment, we looked into both Naïve Bayes and Support Vector Machine (SVM) classifiers. Three different feature extraction methods namely bag of words model, bag of words model with stopwords removed and using of significant bigrams are used for training the classifier. Out of the three, the use of significant bigrams performed the best by achieving 67% in accuracy whereas the bag of words model performed the worst for Naïve Bayes classifier. On the other hand, SVM classifier performs well in both bag of words model and bag of words model with stopwords removed, achieving an accuracy of about 99%. However, this may indicate an overfitting due to the large sparse of features. Nevertheless, this experiment shows that the automation system of labelling the reviews is possible and it is one step closer in achieving to the goal.
author2 Cong Gao
author_facet Cong Gao
Heng, Lai Xiang
format Final Year Project
author Heng, Lai Xiang
author_sort Heng, Lai Xiang
title Lexical knowledge-based machine learning method for sentiment analysis
title_short Lexical knowledge-based machine learning method for sentiment analysis
title_full Lexical knowledge-based machine learning method for sentiment analysis
title_fullStr Lexical knowledge-based machine learning method for sentiment analysis
title_full_unstemmed Lexical knowledge-based machine learning method for sentiment analysis
title_sort lexical knowledge-based machine learning method for sentiment analysis
publishDate 2015
url http://hdl.handle.net/10356/62824
_version_ 1759854211632201728