Building a Filipino polarity lexicon from a sentence-level annotated dataset using various statistical measures
The internet is populated with numerous contents on various topics, ranging from products and services to events and experiences. These contents carry authored sentiments or opinions describing the benefit and limitations of a product and first hand experiences of user which are potentially useful f...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2013
|
Online Access: | https://animorepository.dlsu.edu.ph/etd_masteral/4584 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
id |
oai:animorepository.dlsu.edu.ph:etd_masteral-11422 |
---|---|
record_format |
eprints |
spelling |
oai:animorepository.dlsu.edu.ph:etd_masteral-114222021-01-27T01:28:29Z Building a Filipino polarity lexicon from a sentence-level annotated dataset using various statistical measures Co, Justin L. The internet is populated with numerous contents on various topics, ranging from products and services to events and experiences. These contents carry authored sentiments or opinions describing the benefit and limitations of a product and first hand experiences of user which are potentially useful for both the consumers and the product manufacturers. Sentiment classification can be used on these contents to extract and identify the opinion of the authors. Although a number of sentiment analysis tasks have already been conducted on contents extracted from the internet, there is still minimal work on the topic for the Filipino language, even more so as there is a lack of sentiment analysis resource for the language. This research, FilSentiNet2 is a lexical resource containing Filipino word entries annotated with its polarity and subjectivity scores to aid in the task of sentiment analysis on the domain of the Filipino language. Currently the lexicon garnered the highest accuracy of 42.58% for individual scores and 44.62% for consolidated scores when the lexicon is used as a lexical resource for sentence-level sentiment classification. The resulting accuracies were achieved by utilizing TF-IDF and PolarityAverageW2 respectively as the scoring algorithms, FilSentiNet SWL to filter out non-subjective terms, and without handling negation and stemming. Although the use of score consolidation generally improved the accuracy the improvements are too small to be significant as only a 2% improvement was achieved. Evaluation is also conducted on the word-level, on the lexicon itself. The experiment garnered the highest accuracy of 49.03% when PMI is used as the scoring algorithm. The difference in the ideal scoring algorithms between sentence-level evaluation and word-level evaluation can be attributed to the nature of the scoring algorithms used. TF-IDF measures the importance of the word to the document which enabled it to perform better when used for classification PMI on the other hand is a measure of the association strength of a word to a classification which is a better representative of the sentiment of the word. 2013-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/4584 Master's Theses English Animo Repository |
institution |
De La Salle University |
building |
De La Salle University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
De La Salle University Library |
collection |
DLSU Institutional Repository |
language |
English |
description |
The internet is populated with numerous contents on various topics, ranging from products and services to events and experiences. These contents carry authored sentiments or opinions describing the benefit and limitations of a product and first hand experiences of user which are potentially useful for both the consumers and the product manufacturers. Sentiment classification can be used on these contents to extract and identify the opinion of the authors. Although a number of sentiment analysis tasks have already been conducted on contents extracted from the internet, there is still minimal work on the topic for the Filipino language, even more so as there is a lack of sentiment analysis resource for the language. This research, FilSentiNet2 is a lexical resource containing Filipino word entries annotated with its polarity and subjectivity scores to aid in the task of sentiment analysis on the domain of the Filipino language. Currently the lexicon garnered the highest accuracy of 42.58% for individual scores and 44.62% for consolidated scores when the lexicon is used as a lexical resource for sentence-level sentiment classification. The resulting accuracies were achieved by utilizing TF-IDF and PolarityAverageW2 respectively as the scoring algorithms, FilSentiNet SWL to filter out non-subjective terms, and without handling negation and stemming. Although the use of score consolidation generally improved the accuracy the improvements are too small to be significant as only a 2% improvement was achieved. Evaluation is also conducted on the word-level, on the lexicon itself. The experiment garnered the highest accuracy of 49.03% when PMI is used as the scoring algorithm. The difference in the ideal scoring algorithms between sentence-level evaluation and word-level evaluation can be attributed to the nature of the scoring algorithms used. TF-IDF measures the importance of the word to the document which enabled it to perform better when used for classification PMI on the other hand is a measure of the association strength of a word to a classification which is a better representative of the sentiment of the word. |
format |
text |
author |
Co, Justin L. |
spellingShingle |
Co, Justin L. Building a Filipino polarity lexicon from a sentence-level annotated dataset using various statistical measures |
author_facet |
Co, Justin L. |
author_sort |
Co, Justin L. |
title |
Building a Filipino polarity lexicon from a sentence-level annotated dataset using various statistical measures |
title_short |
Building a Filipino polarity lexicon from a sentence-level annotated dataset using various statistical measures |
title_full |
Building a Filipino polarity lexicon from a sentence-level annotated dataset using various statistical measures |
title_fullStr |
Building a Filipino polarity lexicon from a sentence-level annotated dataset using various statistical measures |
title_full_unstemmed |
Building a Filipino polarity lexicon from a sentence-level annotated dataset using various statistical measures |
title_sort |
building a filipino polarity lexicon from a sentence-level annotated dataset using various statistical measures |
publisher |
Animo Repository |
publishDate |
2013 |
url |
https://animorepository.dlsu.edu.ph/etd_masteral/4584 |
_version_ |
1772836009471377408 |