Vox pop: Automated opinion detection and classification with data clustering
A large amount of opinions, such as those found in blogs, forums and product reviews, are being uploaded daily as internet technology is progressing. However, these data bring more inconvenience than benefits due to its lack or organization. It is also difficult to find and underutilized. With the u...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2010
|
Subjects: | |
Online Access: | https://animorepository.dlsu.edu.ph/etd_bachelors/10133 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
Summary: | A large amount of opinions, such as those found in blogs, forums and product reviews, are being uploaded daily as internet technology is progressing. However, these data bring more inconvenience than benefits due to its lack or organization. It is also difficult to find and underutilized. With the use of Natural Language Processing, it is possible to organize these data making it useful to aid in decision or policy making. This paper will focus on the development of a system that uses text processing techniques in organizing the sentiments of public commentaries.
Current systems are able to differentiate facts from opinions, as well as classify these opinions based on their polarity. Clustering has also been done based on the words used. The system Vox Pop performs there three functions, namely, opinion detection, polarity classification and clustering using a rule-based approach. Opinions are classified by computing for polarity using scores produced by SentiWordNet. Commentaries are clustered by computing for the Euclidean Distance of each word. SentiWordNet, MontlyTagger and K-Means clustering for the Euclidean Distance of each word. SentiWordNet, MontlyTagger and K-Means clustering algorithm are some of the resources and tools used by the system. Expert and non-expert evaluations were done in order to test the system. The detection, classification and clustering modules have accuracy rates of 50.5% and 53.85% respectively. |
---|