Sentiment analysis based on combination of term weighting schemes and word vectors
Term weighting schemes are widely used in text mining tasks and supervised term weighting schemes have better performances on sentiment analysis task because the available labels of training documents make the learned model more discriminative. In this thesis, based on bag of words model, we introdu...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/68724 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Term weighting schemes are widely used in text mining tasks and supervised term weighting schemes have better performances on sentiment analysis task because the available labels of training documents make the learned model more discriminative. In this thesis, based on bag of words model, we introduced three supervised term weighting schemes and have shown their effectiveness for sentiment analysis in experiments. We also introduced the advanced word vectors technology and used the cosine similarity technique to measure intrinsic relationship between words to overcome the data sparsity problem. Based on term weighting schemes and word vectors technology, we proposed two kinds of ideas to utilize word vectors in sentiment analysis systems. The first idea lies that we combined word vectors and our introduced term weighting schemes by vector multiplication operation to generate effective document feature vectors. The second one is that, we applied these introduced supervised weighting schemes on bag of words models where binary term frequencies are the features and word vectors are used as a measure to correlate unknown test document words with training document words and predict the weights of unknown testing words. Our experiment results show supervised term weighting schemes and the intrinsic information among words discovered by word vectors can really improve the performance of sentiment analysis system jointly. Our methods outperform the state of the art methods on long-length document datasets and have competitive performances on short-length document datasets. |
---|