Analyzing Filipino editorials through information extraction and sentiment analysis

The purpose of this research is to allow easy data analysis upon performing information extraction and sentiment analysis on Filipino editorials. Information extraction was guided by rules based from researchers observation and was automated through bootstrapping. The attributes that were extracted...

Full description

Saved in:

Bibliographic Details
Main Author:	Cagampan, Bernadyn Reyes
Format:	text
Language:	English
Published:	Animo Repository 2015
Online Access:	https://animorepository.dlsu.edu.ph/etd_masteral/4819
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	De La Salle University
Language:	English

id	oai:animorepository.dlsu.edu.ph:etd_masteral-11657
record_format	eprints
spelling	oai:animorepository.dlsu.edu.ph:etd_masteral-116572021-02-05T08:21:00Z Analyzing Filipino editorials through information extraction and sentiment analysis Cagampan, Bernadyn Reyes The purpose of this research is to allow easy data analysis upon performing information extraction and sentiment analysis on Filipino editorials. Information extraction was guided by rules based from researchers observation and was automated through bootstrapping. The attributes that were extracted are the Tagalog equivalent of the 5W user requirement proposed by Das et al. (2012) that encompasses sino (who), ano (what), kailan (when), saan (where), and bakit (why). Consequently, comparative experiments on sentiment analysis were done using machine learning and lexicon-based approaches. Both information extraction and sentiment analysis were done on paragraph level. Collective result was presented visually. In the process of developing the visualization, several factors were considered including how the end user will be able to comprehend the important points in the editorials and the overall sentiment present in each. The three main components of the research process namely information extraction, sentiment analysis, and result visualization were evaluated objectively and subjectively. To evaluate the performance of rule-based information extraction, a gold standard was built to which the machine output was compared. The result of the approach was below average in extracting ano, sino, and saan features with a correctness percentage of 0%, 6.06%, and 19.51% respectively. It did perform on average in extracting bakit feature with 50% correct extraction. The highest result recorded was 84.39% in kailan feature extraction. The performance of lexicon-based and machine learning-based sentiment analysis were also compared in this research. Machine learning-based sentiment analysis was known to perform well on bigger data sets upon attaining a classi cation accuracy of 80.98% as compared to the 61% accuracy of lexicon-based approach. Lexicon-based approach also showcased its potential upon obtaining an accuracy of 87.71% over the 70.5% accuracy of machine learning-based approach in balanced data set with few instances only. The visualization elements that represented the output of the two major processes of this research were evaluated to be appropriate representations. The visualization system was also subjectively rated to be easy to use and understand. 2015-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/4819 Master's Theses English Animo Repository
institution	De La Salle University
building	De La Salle University Library
continent	Asia
country	Philippines Philippines
content_provider	De La Salle University Library
collection	DLSU Institutional Repository
language	English
description	The purpose of this research is to allow easy data analysis upon performing information extraction and sentiment analysis on Filipino editorials. Information extraction was guided by rules based from researchers observation and was automated through bootstrapping. The attributes that were extracted are the Tagalog equivalent of the 5W user requirement proposed by Das et al. (2012) that encompasses sino (who), ano (what), kailan (when), saan (where), and bakit (why). Consequently, comparative experiments on sentiment analysis were done using machine learning and lexicon-based approaches. Both information extraction and sentiment analysis were done on paragraph level. Collective result was presented visually. In the process of developing the visualization, several factors were considered including how the end user will be able to comprehend the important points in the editorials and the overall sentiment present in each. The three main components of the research process namely information extraction, sentiment analysis, and result visualization were evaluated objectively and subjectively. To evaluate the performance of rule-based information extraction, a gold standard was built to which the machine output was compared. The result of the approach was below average in extracting ano, sino, and saan features with a correctness percentage of 0%, 6.06%, and 19.51% respectively. It did perform on average in extracting bakit feature with 50% correct extraction. The highest result recorded was 84.39% in kailan feature extraction. The performance of lexicon-based and machine learning-based sentiment analysis were also compared in this research. Machine learning-based sentiment analysis was known to perform well on bigger data sets upon attaining a classi cation accuracy of 80.98% as compared to the 61% accuracy of lexicon-based approach. Lexicon-based approach also showcased its potential upon obtaining an accuracy of 87.71% over the 70.5% accuracy of machine learning-based approach in balanced data set with few instances only. The visualization elements that represented the output of the two major processes of this research were evaluated to be appropriate representations. The visualization system was also subjectively rated to be easy to use and understand.
format	text
author	Cagampan, Bernadyn Reyes
spellingShingle	Cagampan, Bernadyn Reyes Analyzing Filipino editorials through information extraction and sentiment analysis
author_facet	Cagampan, Bernadyn Reyes
author_sort	Cagampan, Bernadyn Reyes
title	Analyzing Filipino editorials through information extraction and sentiment analysis
title_short	Analyzing Filipino editorials through information extraction and sentiment analysis
title_full	Analyzing Filipino editorials through information extraction and sentiment analysis
title_fullStr	Analyzing Filipino editorials through information extraction and sentiment analysis
title_full_unstemmed	Analyzing Filipino editorials through information extraction and sentiment analysis
title_sort	analyzing filipino editorials through information extraction and sentiment analysis
publisher	Animo Repository
publishDate	2015
url	https://animorepository.dlsu.edu.ph/etd_masteral/4819
_version_	1775631184631431168

Analyzing Filipino editorials through information extraction and sentiment analysis

Similar Items