Lexical knowledge-based machine learning method for sentiment analysis
Before doing any sentiment analysis or classifications, one would need labelled reviews (either a positive or negative sentiment) to do further data mining or natural language processing. Labelling of reviews are done manually and are usually time-consuming and demanding. In this paper, we proposed...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/62824 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-62824 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-628242023-03-03T20:34:55Z Lexical knowledge-based machine learning method for sentiment analysis Heng, Lai Xiang Cong Gao School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Information systems Before doing any sentiment analysis or classifications, one would need labelled reviews (either a positive or negative sentiment) to do further data mining or natural language processing. Labelling of reviews are done manually and are usually time-consuming and demanding. In this paper, we proposed a new learning algorithm, which is to combine supervised learning with the pre-compiled opinion lexicons. Using this algorithm, manpower and time needed are greatly reduced as it will not require manually labelling of reviews. For this project, customers’ reviews on restaurants will be used from the rich pool of Yelp dataset. There are a total of five steps to the new algorithm: 1) Building two pseudo positive and negative documents. 2) Computation on the pairwise document similarity between the review documents and the positive and negative documents using either the Cosine Similarity or Euclidean Distance approach. 3) Labelling the reviews to either a positive or negative sentiment based on the similarity results. 4) Rank the reviews. 5) Selecting top 2,000 reviews, each 1,000 from the positive and negative labelled documents for sentiment classification model building. In this experiment, we looked into both Naïve Bayes and Support Vector Machine (SVM) classifiers. Three different feature extraction methods namely bag of words model, bag of words model with stopwords removed and using of significant bigrams are used for training the classifier. Out of the three, the use of significant bigrams performed the best by achieving 67% in accuracy whereas the bag of words model performed the worst for Naïve Bayes classifier. On the other hand, SVM classifier performs well in both bag of words model and bag of words model with stopwords removed, achieving an accuracy of about 99%. However, this may indicate an overfitting due to the large sparse of features. Nevertheless, this experiment shows that the automation system of labelling the reviews is possible and it is one step closer in achieving to the goal. Bachelor of Engineering (Computer Science) 2015-04-29T07:58:45Z 2015-04-29T07:58:45Z 2015 2015 Final Year Project (FYP) http://hdl.handle.net/10356/62824 en Nanyang Technological University 44 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Information systems |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Information systems Heng, Lai Xiang Lexical knowledge-based machine learning method for sentiment analysis |
description |
Before doing any sentiment analysis or classifications, one would need labelled reviews (either a positive or negative sentiment) to do further data mining or natural language processing. Labelling of reviews are done manually and are usually time-consuming and demanding. In this paper, we proposed a new learning algorithm, which is to combine supervised learning with the pre-compiled opinion lexicons. Using this algorithm, manpower and time needed are greatly reduced as it will not require manually labelling of reviews. For this project, customers’ reviews on restaurants will be used from the rich pool of Yelp dataset. There are a total of five steps to the new algorithm: 1) Building two pseudo positive and negative documents. 2) Computation on the pairwise document similarity between the review documents and the positive and negative documents using either the Cosine Similarity or Euclidean Distance approach. 3) Labelling the reviews to either a positive or negative sentiment based on the similarity results. 4) Rank the reviews. 5) Selecting top 2,000 reviews, each 1,000 from the positive and negative labelled documents for sentiment classification model building. In this experiment, we looked into both Naïve Bayes and Support Vector Machine (SVM) classifiers. Three different feature extraction methods namely bag of words model, bag of words model with stopwords removed and using of significant bigrams are used for training the classifier. Out of the three, the use of significant bigrams performed the best by achieving 67% in accuracy whereas the bag of words model performed the worst for Naïve Bayes classifier. On the other hand, SVM classifier performs well in both bag of words model and bag of words model with stopwords removed, achieving an accuracy of about 99%. However, this may indicate an overfitting due to the large sparse of features. Nevertheless, this experiment shows that the automation system of labelling the reviews is possible and it is one step closer in achieving to the goal. |
author2 |
Cong Gao |
author_facet |
Cong Gao Heng, Lai Xiang |
format |
Final Year Project |
author |
Heng, Lai Xiang |
author_sort |
Heng, Lai Xiang |
title |
Lexical knowledge-based machine learning method for sentiment analysis |
title_short |
Lexical knowledge-based machine learning method for sentiment analysis |
title_full |
Lexical knowledge-based machine learning method for sentiment analysis |
title_fullStr |
Lexical knowledge-based machine learning method for sentiment analysis |
title_full_unstemmed |
Lexical knowledge-based machine learning method for sentiment analysis |
title_sort |
lexical knowledge-based machine learning method for sentiment analysis |
publishDate |
2015 |
url |
http://hdl.handle.net/10356/62824 |
_version_ |
1759854211632201728 |