Towards a bilingual sentiment analysis model for English and Filipino

There is an opportunity to learn and understand how Filipinos think, behave and react online, especially in responding to significant events. Resources, such as lexicons and corpora or a combination in a target language, as well as selection machine learning classifiers may be used to address this o...

Full description

Saved in:
Bibliographic Details
Main Author: MARLENE, DE LEON
Format: text
Published: Archīum Ateneo 2013
Subjects:
Online Access:https://archium.ateneo.edu/theses-dissertations/226
http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=234946739&currentIndex=0&view=fullDetailsDetailsTab
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
id ph-ateneo-arc.theses-dissertations-1352
record_format eprints
spelling ph-ateneo-arc.theses-dissertations-13522021-07-06T02:19:47Z Towards a bilingual sentiment analysis model for English and Filipino MARLENE, DE LEON There is an opportunity to learn and understand how Filipinos think, behave and react online, especially in responding to significant events. Resources, such as lexicons and corpora or a combination in a target language, as well as selection machine learning classifiers may be used to address this opportunity. However, there is little work on bilingual conversations. Filipino Tweets provide a rich source of data for building corpora and model for this kind of classification as it is composed of a mixture of English and mostly Filipino terms. This study looked into building bilingual sentiment analysis models for classifying bilingual English and Filipino disaster tweets. The study applied a supervised learning approach for subjective and sentiment models using Support Vector Machine (SVM), Na?ve Bayes, and K-Nearest Neighbor (K-NN) and bilingual English and Filipino lexicon, corpora and a combination in fixed distribution sets, in creating bilingual English and Filipino sentiment analysis models. Accuracy, precision, recall and F-measure were used to evaluate the performance of the models. Each of the resulting models were further evaluated against manually annotated corpora of tweets to determine its performance and reliability. For the bilingual subjective classification model, performance was highest in Nave Bayes, using the combination of lexicon and corpora, at 95% objective-5% subjective imbalanced distribution, with F measure of 73.53%. Similarly, the bilingual sentiment classification model performed highest in Na?ve Bayes, using the combination of lexicon and corpora, at 95% positive-5% negative, with F measure of 72.41%. The study showed that for English-Filipino sentiments, bilingual classification works best with an imbalanced distribution scheme and combination of lexicon and corpora data sets. PCA was performed further on the resulting positive and negative sentiments to obtain manifest constructs on sentiments. Results showed a promising possibility of extending the bilingual sentiment classification model further to include specific positive and negative emotions. 2013-01-01T08:00:00Z text https://archium.ateneo.edu/theses-dissertations/226 http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=234946739&currentIndex=0&view=fullDetailsDetailsTab Theses and Dissertations (All) Archīum Ateneo Computational linguistics -- Case studies Corpora (Linguistics) Code switching (Linguistics) Artifical intelligence Computer Engineering
institution Ateneo De Manila University
building Ateneo De Manila University Library
continent Asia
country Philippines
Philippines
content_provider Ateneo De Manila University Library
collection archium.Ateneo Institutional Repository
topic Computational linguistics -- Case studies
Corpora (Linguistics)
Code switching (Linguistics)
Artifical intelligence
Computer Engineering
spellingShingle Computational linguistics -- Case studies
Corpora (Linguistics)
Code switching (Linguistics)
Artifical intelligence
Computer Engineering
MARLENE, DE LEON
Towards a bilingual sentiment analysis model for English and Filipino
description There is an opportunity to learn and understand how Filipinos think, behave and react online, especially in responding to significant events. Resources, such as lexicons and corpora or a combination in a target language, as well as selection machine learning classifiers may be used to address this opportunity. However, there is little work on bilingual conversations. Filipino Tweets provide a rich source of data for building corpora and model for this kind of classification as it is composed of a mixture of English and mostly Filipino terms. This study looked into building bilingual sentiment analysis models for classifying bilingual English and Filipino disaster tweets. The study applied a supervised learning approach for subjective and sentiment models using Support Vector Machine (SVM), Na?ve Bayes, and K-Nearest Neighbor (K-NN) and bilingual English and Filipino lexicon, corpora and a combination in fixed distribution sets, in creating bilingual English and Filipino sentiment analysis models. Accuracy, precision, recall and F-measure were used to evaluate the performance of the models. Each of the resulting models were further evaluated against manually annotated corpora of tweets to determine its performance and reliability. For the bilingual subjective classification model, performance was highest in Nave Bayes, using the combination of lexicon and corpora, at 95% objective-5% subjective imbalanced distribution, with F measure of 73.53%. Similarly, the bilingual sentiment classification model performed highest in Na?ve Bayes, using the combination of lexicon and corpora, at 95% positive-5% negative, with F measure of 72.41%. The study showed that for English-Filipino sentiments, bilingual classification works best with an imbalanced distribution scheme and combination of lexicon and corpora data sets. PCA was performed further on the resulting positive and negative sentiments to obtain manifest constructs on sentiments. Results showed a promising possibility of extending the bilingual sentiment classification model further to include specific positive and negative emotions.
format text
author MARLENE, DE LEON
author_facet MARLENE, DE LEON
author_sort MARLENE, DE LEON
title Towards a bilingual sentiment analysis model for English and Filipino
title_short Towards a bilingual sentiment analysis model for English and Filipino
title_full Towards a bilingual sentiment analysis model for English and Filipino
title_fullStr Towards a bilingual sentiment analysis model for English and Filipino
title_full_unstemmed Towards a bilingual sentiment analysis model for English and Filipino
title_sort towards a bilingual sentiment analysis model for english and filipino
publisher Archīum Ateneo
publishDate 2013
url https://archium.ateneo.edu/theses-dissertations/226
http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=234946739&currentIndex=0&view=fullDetailsDetailsTab
_version_ 1712577819686469632