A framework for mining opinions from user generated content
In this thesis we have presented a scalable framework for mining features and opinions from online reviews. Large scale opinion mining requires scalable components for data storage, along with unsupervised learning solutions for extracting features and opinions, with the ultimate goal of generating...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/54823 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In this thesis we have presented a scalable framework for mining features and opinions from online reviews. Large scale opinion mining requires scalable components for data storage, along with unsupervised learning solutions for extracting features and opinions, with the ultimate goal of generating meaningful summaries. We have built our system using travel reviews but the system can be used on any domain with minimal changes.
Our focus is to come up with a highly scalable framework. A system which can scale both horizontally and vertically to deploy on large scale distributed systems. Hence, we presented an architecture by carefully examining every component used in the system including the database for storing reviews. We have compared various choices and chosen state-of-the-art open source technologies that use distributed multi-node architecture. As a result, millions of reviews can be stored and indexed. We have used travel reviews for testing purpose but the system can be used on any domain with minimal changes.
We have implemented a dynamic feature extraction engine that utilizes unsupervised learning to associate extracted features and opinions starting with only one domain seed feature. For example, the feature seed word 'hotel' is all that is needed to extract a list of related hotel feature words like 'room' and 'service'. Next we extract the opinions expressed on the dynamically extracted features and perform sentence level sentiment analysis. To present the results to the end user in an intuitive manner, we subsequently created a web interface and experimented with new visualization techniques.
Experiments were conducted to evaluate the system and proposed methods. From the analysis of the results we discuss drawbacks of our current approach and future direction of the research. Finally, a fully-functioning prototype has been created to demonstrate the end-to-end system. |
---|