Sentiment analysis based on combination of term weighting schemes and word vectors
Term weighting schemes are widely used in text mining tasks and supervised term weighting schemes have better performances on sentiment analysis task because the available labels of training documents make the learned model more discriminative. In this thesis, based on bag of words model, we introdu...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/68724 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-68724 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-687242023-07-04T15:04:31Z Sentiment analysis based on combination of term weighting schemes and word vectors Jin, Linbo Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Term weighting schemes are widely used in text mining tasks and supervised term weighting schemes have better performances on sentiment analysis task because the available labels of training documents make the learned model more discriminative. In this thesis, based on bag of words model, we introduced three supervised term weighting schemes and have shown their effectiveness for sentiment analysis in experiments. We also introduced the advanced word vectors technology and used the cosine similarity technique to measure intrinsic relationship between words to overcome the data sparsity problem. Based on term weighting schemes and word vectors technology, we proposed two kinds of ideas to utilize word vectors in sentiment analysis systems. The first idea lies that we combined word vectors and our introduced term weighting schemes by vector multiplication operation to generate effective document feature vectors. The second one is that, we applied these introduced supervised weighting schemes on bag of words models where binary term frequencies are the features and word vectors are used as a measure to correlate unknown test document words with training document words and predict the weights of unknown testing words. Our experiment results show supervised term weighting schemes and the intrinsic information among words discovered by word vectors can really improve the performance of sentiment analysis system jointly. Our methods outperform the state of the art methods on long-length document datasets and have competitive performances on short-length document datasets. Master of Science (Computer Control and Automation) 2016-05-31T03:54:04Z 2016-05-31T03:54:04Z 2016 Thesis http://hdl.handle.net/10356/68724 en 70 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering Jin, Linbo Sentiment analysis based on combination of term weighting schemes and word vectors |
description |
Term weighting schemes are widely used in text mining tasks and supervised term weighting schemes have better performances on sentiment analysis task because the available labels of training documents make the learned model more discriminative. In this thesis, based on bag of words model, we introduced three supervised term weighting schemes and have shown their effectiveness for sentiment analysis in experiments. We also introduced the advanced word vectors technology and used the cosine similarity technique to measure intrinsic relationship between words to overcome the data sparsity problem. Based on term weighting schemes and word vectors technology, we proposed two kinds of ideas to utilize word vectors in sentiment analysis systems. The first idea lies that we combined word vectors and our introduced term weighting schemes by vector multiplication operation to generate effective document feature vectors. The second one is that, we applied these introduced supervised weighting schemes on bag of words models where binary term frequencies are the features and word vectors are used as a measure to correlate unknown test document words with training document words and predict the weights of unknown testing words. Our experiment results show supervised term weighting schemes and the intrinsic information among words discovered by word vectors can really improve the performance of sentiment analysis system jointly. Our methods outperform the state of the art methods on long-length document datasets and have competitive performances on short-length document datasets. |
author2 |
Mao Kezhi |
author_facet |
Mao Kezhi Jin, Linbo |
format |
Theses and Dissertations |
author |
Jin, Linbo |
author_sort |
Jin, Linbo |
title |
Sentiment analysis based on combination of term weighting schemes and word vectors |
title_short |
Sentiment analysis based on combination of term weighting schemes and word vectors |
title_full |
Sentiment analysis based on combination of term weighting schemes and word vectors |
title_fullStr |
Sentiment analysis based on combination of term weighting schemes and word vectors |
title_full_unstemmed |
Sentiment analysis based on combination of term weighting schemes and word vectors |
title_sort |
sentiment analysis based on combination of term weighting schemes and word vectors |
publishDate |
2016 |
url |
http://hdl.handle.net/10356/68724 |
_version_ |
1772825599451070464 |