Sentiment analysis based on combination of term weighting schemes and word vectors

Term weighting schemes are widely used in text mining tasks and supervised term weighting schemes have better performances on sentiment analysis task because the available labels of training documents make the learned model more discriminative. In this thesis, based on bag of words model, we introdu...

Full description

Saved in:
Bibliographic Details
Main Author: Jin, Linbo
Other Authors: Mao Kezhi
Format: Theses and Dissertations
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/68724
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-68724
record_format dspace
spelling sg-ntu-dr.10356-687242023-07-04T15:04:31Z Sentiment analysis based on combination of term weighting schemes and word vectors Jin, Linbo Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Term weighting schemes are widely used in text mining tasks and supervised term weighting schemes have better performances on sentiment analysis task because the available labels of training documents make the learned model more discriminative. In this thesis, based on bag of words model, we introduced three supervised term weighting schemes and have shown their effectiveness for sentiment analysis in experiments. We also introduced the advanced word vectors technology and used the cosine similarity technique to measure intrinsic relationship between words to overcome the data sparsity problem. Based on term weighting schemes and word vectors technology, we proposed two kinds of ideas to utilize word vectors in sentiment analysis systems. The first idea lies that we combined word vectors and our introduced term weighting schemes by vector multiplication operation to generate effective document feature vectors. The second one is that, we applied these introduced supervised weighting schemes on bag of words models where binary term frequencies are the features and word vectors are used as a measure to correlate unknown test document words with training document words and predict the weights of unknown testing words. Our experiment results show supervised term weighting schemes and the intrinsic information among words discovered by word vectors can really improve the performance of sentiment analysis system jointly. Our methods outperform the state of the art methods on long-length document datasets and have competitive performances on short-length document datasets. Master of Science (Computer Control and Automation) 2016-05-31T03:54:04Z 2016-05-31T03:54:04Z 2016 Thesis http://hdl.handle.net/10356/68724 en 70 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering
spellingShingle DRNTU::Engineering::Electrical and electronic engineering
Jin, Linbo
Sentiment analysis based on combination of term weighting schemes and word vectors
description Term weighting schemes are widely used in text mining tasks and supervised term weighting schemes have better performances on sentiment analysis task because the available labels of training documents make the learned model more discriminative. In this thesis, based on bag of words model, we introduced three supervised term weighting schemes and have shown their effectiveness for sentiment analysis in experiments. We also introduced the advanced word vectors technology and used the cosine similarity technique to measure intrinsic relationship between words to overcome the data sparsity problem. Based on term weighting schemes and word vectors technology, we proposed two kinds of ideas to utilize word vectors in sentiment analysis systems. The first idea lies that we combined word vectors and our introduced term weighting schemes by vector multiplication operation to generate effective document feature vectors. The second one is that, we applied these introduced supervised weighting schemes on bag of words models where binary term frequencies are the features and word vectors are used as a measure to correlate unknown test document words with training document words and predict the weights of unknown testing words. Our experiment results show supervised term weighting schemes and the intrinsic information among words discovered by word vectors can really improve the performance of sentiment analysis system jointly. Our methods outperform the state of the art methods on long-length document datasets and have competitive performances on short-length document datasets.
author2 Mao Kezhi
author_facet Mao Kezhi
Jin, Linbo
format Theses and Dissertations
author Jin, Linbo
author_sort Jin, Linbo
title Sentiment analysis based on combination of term weighting schemes and word vectors
title_short Sentiment analysis based on combination of term weighting schemes and word vectors
title_full Sentiment analysis based on combination of term weighting schemes and word vectors
title_fullStr Sentiment analysis based on combination of term weighting schemes and word vectors
title_full_unstemmed Sentiment analysis based on combination of term weighting schemes and word vectors
title_sort sentiment analysis based on combination of term weighting schemes and word vectors
publishDate 2016
url http://hdl.handle.net/10356/68724
_version_ 1772825599451070464