Analyzing Filipino editorials through information extraction and sentiment analysis

The purpose of this research is to allow easy data analysis upon performing information extraction and sentiment analysis on Filipino editorials. Information extraction was guided by rules based from researchers observation and was automated through bootstrapping. The attributes that were extracted...

Full description

Saved in:
Bibliographic Details
Main Author: Cagampan, Bernadyn Reyes
Format: text
Language:English
Published: Animo Repository 2015
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/4819
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_masteral-11657
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_masteral-116572021-02-05T08:21:00Z Analyzing Filipino editorials through information extraction and sentiment analysis Cagampan, Bernadyn Reyes The purpose of this research is to allow easy data analysis upon performing information extraction and sentiment analysis on Filipino editorials. Information extraction was guided by rules based from researchers observation and was automated through bootstrapping. The attributes that were extracted are the Tagalog equivalent of the 5W user requirement proposed by Das et al. (2012) that encompasses sino (who), ano (what), kailan (when), saan (where), and bakit (why). Consequently, comparative experiments on sentiment analysis were done using machine learning and lexicon-based approaches. Both information extraction and sentiment analysis were done on paragraph level. Collective result was presented visually. In the process of developing the visualization, several factors were considered including how the end user will be able to comprehend the important points in the editorials and the overall sentiment present in each. The three main components of the research process namely information extraction, sentiment analysis, and result visualization were evaluated objectively and subjectively. To evaluate the performance of rule-based information extraction, a gold standard was built to which the machine output was compared. The result of the approach was below average in extracting ano, sino, and saan features with a correctness percentage of 0%, 6.06%, and 19.51% respectively. It did perform on average in extracting bakit feature with 50% correct extraction. The highest result recorded was 84.39% in kailan feature extraction. The performance of lexicon-based and machine learning-based sentiment analysis were also compared in this research. Machine learning-based sentiment analysis was known to perform well on bigger data sets upon attaining a classi cation accuracy of 80.98% as compared to the 61% accuracy of lexicon-based approach. Lexicon-based approach also showcased its potential upon obtaining an accuracy of 87.71% over the 70.5% accuracy of machine learning-based approach in balanced data set with few instances only. The visualization elements that represented the output of the two major processes of this research were evaluated to be appropriate representations. The visualization system was also subjectively rated to be easy to use and understand. 2015-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/4819 Master's Theses English Animo Repository
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
description The purpose of this research is to allow easy data analysis upon performing information extraction and sentiment analysis on Filipino editorials. Information extraction was guided by rules based from researchers observation and was automated through bootstrapping. The attributes that were extracted are the Tagalog equivalent of the 5W user requirement proposed by Das et al. (2012) that encompasses sino (who), ano (what), kailan (when), saan (where), and bakit (why). Consequently, comparative experiments on sentiment analysis were done using machine learning and lexicon-based approaches. Both information extraction and sentiment analysis were done on paragraph level. Collective result was presented visually. In the process of developing the visualization, several factors were considered including how the end user will be able to comprehend the important points in the editorials and the overall sentiment present in each. The three main components of the research process namely information extraction, sentiment analysis, and result visualization were evaluated objectively and subjectively. To evaluate the performance of rule-based information extraction, a gold standard was built to which the machine output was compared. The result of the approach was below average in extracting ano, sino, and saan features with a correctness percentage of 0%, 6.06%, and 19.51% respectively. It did perform on average in extracting bakit feature with 50% correct extraction. The highest result recorded was 84.39% in kailan feature extraction. The performance of lexicon-based and machine learning-based sentiment analysis were also compared in this research. Machine learning-based sentiment analysis was known to perform well on bigger data sets upon attaining a classi cation accuracy of 80.98% as compared to the 61% accuracy of lexicon-based approach. Lexicon-based approach also showcased its potential upon obtaining an accuracy of 87.71% over the 70.5% accuracy of machine learning-based approach in balanced data set with few instances only. The visualization elements that represented the output of the two major processes of this research were evaluated to be appropriate representations. The visualization system was also subjectively rated to be easy to use and understand.
format text
author Cagampan, Bernadyn Reyes
spellingShingle Cagampan, Bernadyn Reyes
Analyzing Filipino editorials through information extraction and sentiment analysis
author_facet Cagampan, Bernadyn Reyes
author_sort Cagampan, Bernadyn Reyes
title Analyzing Filipino editorials through information extraction and sentiment analysis
title_short Analyzing Filipino editorials through information extraction and sentiment analysis
title_full Analyzing Filipino editorials through information extraction and sentiment analysis
title_fullStr Analyzing Filipino editorials through information extraction and sentiment analysis
title_full_unstemmed Analyzing Filipino editorials through information extraction and sentiment analysis
title_sort analyzing filipino editorials through information extraction and sentiment analysis
publisher Animo Repository
publishDate 2015
url https://animorepository.dlsu.edu.ph/etd_masteral/4819
_version_ 1775631184631431168