Product name and associated user sentiment retrieval from tweets

With a growing section of the web dedicated to the reviews, discussions and advertisement of products through micro-blogging, it has become imperative to develop techniques for automated Product name extraction from user generated short texts. In this report, we propose a system for mining Tweets to...

Full description

Saved in:
Bibliographic Details
Main Author: Saraf, Avnish
Other Authors: Gao Cong
Format: Final Year Project
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/63473
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:With a growing section of the web dedicated to the reviews, discussions and advertisement of products through micro-blogging, it has become imperative to develop techniques for automated Product name extraction from user generated short texts. In this report, we propose a system for mining Tweets to analyze and extract the product name mention and the corresponding sentiments towards the product. We survey the information retrieval research landscape and decide on using a hybrid method for product name extraction. Our novel method combines the fuzzy dictionary matching approach to a CRFbased Named Entity Recognition approach to handle the inconsistencies of user generated short texts during extraction. Further, we also probe the widely popular sentiment mining field. We begin by studying the existing works and then propose a Machine Learning based approach operating at the sentence level granularity adapted suitably for handling micro blogs. For evaluation, we generate a dataset of 2,032 Tweets, anually annotated with the associated sentiment and the product name mentions. Evaluation on this data shows that our Hybrid method outperforms all the existing methods and achieves a Precision of 0.95, Recall of 0.98 and F1 score of 0.97 along with a 69% accurate sentiment analysis. We also provide an extensive comparison of our algorithm with one of the most popular NER systems available, the Stanford NER and show that our method produces a 38% improvement over it for user generated micro text. A detailed analysis of the performance of the individual components is also provided to establish the synergic performance of the hybrid method as compared to the fuzzy dictionary matching method and the CRF method individually.