Sentiment analysis based on combination of term weighting schemes and word vectors

Term weighting schemes are widely used in text mining tasks and supervised term weighting schemes have better performances on sentiment analysis task because the available labels of training documents make the learned model more discriminative. In this thesis, based on bag of words model, we introdu...

Full description

Saved in:
Bibliographic Details
Main Author: Jin, Linbo
Other Authors: Mao Kezhi
Format: Theses and Dissertations
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/68724
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Term weighting schemes are widely used in text mining tasks and supervised term weighting schemes have better performances on sentiment analysis task because the available labels of training documents make the learned model more discriminative. In this thesis, based on bag of words model, we introduced three supervised term weighting schemes and have shown their effectiveness for sentiment analysis in experiments. We also introduced the advanced word vectors technology and used the cosine similarity technique to measure intrinsic relationship between words to overcome the data sparsity problem. Based on term weighting schemes and word vectors technology, we proposed two kinds of ideas to utilize word vectors in sentiment analysis systems. The first idea lies that we combined word vectors and our introduced term weighting schemes by vector multiplication operation to generate effective document feature vectors. The second one is that, we applied these introduced supervised weighting schemes on bag of words models where binary term frequencies are the features and word vectors are used as a measure to correlate unknown test document words with training document words and predict the weights of unknown testing words. Our experiment results show supervised term weighting schemes and the intrinsic information among words discovered by word vectors can really improve the performance of sentiment analysis system jointly. Our methods outperform the state of the art methods on long-length document datasets and have competitive performances on short-length document datasets.