Analyzing Filipino editorials through information extraction and sentiment analysis
The purpose of this research is to allow easy data analysis upon performing information extraction and sentiment analysis on Filipino editorials. Information extraction was guided by rules based from researchers observation and was automated through bootstrapping. The attributes that were extracted...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2015
|
Online Access: | https://animorepository.dlsu.edu.ph/etd_masteral/4819 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
id |
oai:animorepository.dlsu.edu.ph:etd_masteral-11657 |
---|---|
record_format |
eprints |
spelling |
oai:animorepository.dlsu.edu.ph:etd_masteral-116572021-02-05T08:21:00Z Analyzing Filipino editorials through information extraction and sentiment analysis Cagampan, Bernadyn Reyes The purpose of this research is to allow easy data analysis upon performing information extraction and sentiment analysis on Filipino editorials. Information extraction was guided by rules based from researchers observation and was automated through bootstrapping. The attributes that were extracted are the Tagalog equivalent of the 5W user requirement proposed by Das et al. (2012) that encompasses sino (who), ano (what), kailan (when), saan (where), and bakit (why). Consequently, comparative experiments on sentiment analysis were done using machine learning and lexicon-based approaches. Both information extraction and sentiment analysis were done on paragraph level. Collective result was presented visually. In the process of developing the visualization, several factors were considered including how the end user will be able to comprehend the important points in the editorials and the overall sentiment present in each. The three main components of the research process namely information extraction, sentiment analysis, and result visualization were evaluated objectively and subjectively. To evaluate the performance of rule-based information extraction, a gold standard was built to which the machine output was compared. The result of the approach was below average in extracting ano, sino, and saan features with a correctness percentage of 0%, 6.06%, and 19.51% respectively. It did perform on average in extracting bakit feature with 50% correct extraction. The highest result recorded was 84.39% in kailan feature extraction. The performance of lexicon-based and machine learning-based sentiment analysis were also compared in this research. Machine learning-based sentiment analysis was known to perform well on bigger data sets upon attaining a classi cation accuracy of 80.98% as compared to the 61% accuracy of lexicon-based approach. Lexicon-based approach also showcased its potential upon obtaining an accuracy of 87.71% over the 70.5% accuracy of machine learning-based approach in balanced data set with few instances only. The visualization elements that represented the output of the two major processes of this research were evaluated to be appropriate representations. The visualization system was also subjectively rated to be easy to use and understand. 2015-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/4819 Master's Theses English Animo Repository |
institution |
De La Salle University |
building |
De La Salle University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
De La Salle University Library |
collection |
DLSU Institutional Repository |
language |
English |
description |
The purpose of this research is to allow easy data analysis upon performing information extraction and sentiment analysis on Filipino editorials. Information extraction was guided by rules based from researchers observation and was automated through bootstrapping. The attributes that were extracted are the Tagalog equivalent of the 5W user requirement proposed by Das et al. (2012) that encompasses sino (who), ano (what), kailan (when), saan (where), and bakit (why). Consequently, comparative experiments on sentiment analysis were done using machine learning and lexicon-based approaches. Both information extraction and sentiment analysis were done on paragraph level. Collective result was presented visually. In the process of developing the visualization, several factors were considered including how the end user will be able to comprehend the important points in the editorials and the overall sentiment present in each. The three main components of the research process namely information extraction, sentiment analysis, and result visualization were evaluated objectively and subjectively.
To evaluate the performance of rule-based information extraction, a gold standard was built to which the machine output was compared. The result of the approach was below average in extracting ano, sino, and saan features with a correctness percentage of 0%, 6.06%, and 19.51% respectively. It did perform on average in extracting bakit feature with 50% correct extraction. The highest result recorded was 84.39% in kailan feature extraction. The performance of lexicon-based and machine learning-based sentiment analysis were also compared in this research. Machine learning-based sentiment analysis was known to perform well on bigger data sets upon attaining a classi cation accuracy of 80.98% as compared to the 61% accuracy of lexicon-based approach. Lexicon-based approach also showcased its potential upon obtaining an accuracy of 87.71% over the 70.5% accuracy of machine learning-based approach in balanced data set with few instances only. The visualization elements that represented the output of the two major processes of this research were evaluated to be appropriate representations. The visualization system was also subjectively rated to be easy to use and understand. |
format |
text |
author |
Cagampan, Bernadyn Reyes |
spellingShingle |
Cagampan, Bernadyn Reyes Analyzing Filipino editorials through information extraction and sentiment analysis |
author_facet |
Cagampan, Bernadyn Reyes |
author_sort |
Cagampan, Bernadyn Reyes |
title |
Analyzing Filipino editorials through information extraction and sentiment analysis |
title_short |
Analyzing Filipino editorials through information extraction and sentiment analysis |
title_full |
Analyzing Filipino editorials through information extraction and sentiment analysis |
title_fullStr |
Analyzing Filipino editorials through information extraction and sentiment analysis |
title_full_unstemmed |
Analyzing Filipino editorials through information extraction and sentiment analysis |
title_sort |
analyzing filipino editorials through information extraction and sentiment analysis |
publisher |
Animo Repository |
publishDate |
2015 |
url |
https://animorepository.dlsu.edu.ph/etd_masteral/4819 |
_version_ |
1775631184631431168 |