Towards a bilingual sentiment analysis model for English and Filipino
There is an opportunity to learn and understand how Filipinos think, behave and react online, especially in responding to significant events. Resources, such as lexicons and corpora or a combination in a target language, as well as selection machine learning classifiers may be used to address this o...
Saved in:
Main Author: | |
---|---|
Format: | text |
Published: |
Archīum Ateneo
2013
|
Subjects: | |
Online Access: | https://archium.ateneo.edu/theses-dissertations/226 http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=234946739&currentIndex=0&view=fullDetailsDetailsTab |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Ateneo De Manila University |
id |
ph-ateneo-arc.theses-dissertations-1352 |
---|---|
record_format |
eprints |
spelling |
ph-ateneo-arc.theses-dissertations-13522021-07-06T02:19:47Z Towards a bilingual sentiment analysis model for English and Filipino MARLENE, DE LEON There is an opportunity to learn and understand how Filipinos think, behave and react online, especially in responding to significant events. Resources, such as lexicons and corpora or a combination in a target language, as well as selection machine learning classifiers may be used to address this opportunity. However, there is little work on bilingual conversations. Filipino Tweets provide a rich source of data for building corpora and model for this kind of classification as it is composed of a mixture of English and mostly Filipino terms. This study looked into building bilingual sentiment analysis models for classifying bilingual English and Filipino disaster tweets. The study applied a supervised learning approach for subjective and sentiment models using Support Vector Machine (SVM), Na?ve Bayes, and K-Nearest Neighbor (K-NN) and bilingual English and Filipino lexicon, corpora and a combination in fixed distribution sets, in creating bilingual English and Filipino sentiment analysis models. Accuracy, precision, recall and F-measure were used to evaluate the performance of the models. Each of the resulting models were further evaluated against manually annotated corpora of tweets to determine its performance and reliability. For the bilingual subjective classification model, performance was highest in Nave Bayes, using the combination of lexicon and corpora, at 95% objective-5% subjective imbalanced distribution, with F measure of 73.53%. Similarly, the bilingual sentiment classification model performed highest in Na?ve Bayes, using the combination of lexicon and corpora, at 95% positive-5% negative, with F measure of 72.41%. The study showed that for English-Filipino sentiments, bilingual classification works best with an imbalanced distribution scheme and combination of lexicon and corpora data sets. PCA was performed further on the resulting positive and negative sentiments to obtain manifest constructs on sentiments. Results showed a promising possibility of extending the bilingual sentiment classification model further to include specific positive and negative emotions. 2013-01-01T08:00:00Z text https://archium.ateneo.edu/theses-dissertations/226 http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=234946739&currentIndex=0&view=fullDetailsDetailsTab Theses and Dissertations (All) Archīum Ateneo Computational linguistics -- Case studies Corpora (Linguistics) Code switching (Linguistics) Artifical intelligence Computer Engineering |
institution |
Ateneo De Manila University |
building |
Ateneo De Manila University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
Ateneo De Manila University Library |
collection |
archium.Ateneo Institutional Repository |
topic |
Computational linguistics -- Case studies Corpora (Linguistics) Code switching (Linguistics) Artifical intelligence Computer Engineering |
spellingShingle |
Computational linguistics -- Case studies Corpora (Linguistics) Code switching (Linguistics) Artifical intelligence Computer Engineering MARLENE, DE LEON Towards a bilingual sentiment analysis model for English and Filipino |
description |
There is an opportunity to learn and understand how Filipinos think, behave and react online, especially in responding to significant events. Resources, such as lexicons and corpora or a combination in a target language, as well as selection machine learning classifiers may be used to address this opportunity. However, there is little work on bilingual conversations. Filipino Tweets provide a rich source of data for building corpora and model for this kind of classification as it is composed of a mixture of English and mostly Filipino terms. This study looked into building bilingual sentiment analysis models for classifying bilingual English and Filipino disaster tweets. The study applied a supervised learning approach for subjective and sentiment models using Support Vector Machine (SVM), Na?ve Bayes, and K-Nearest Neighbor (K-NN) and bilingual English and Filipino lexicon, corpora and a combination in fixed distribution sets, in creating bilingual English and Filipino sentiment analysis models. Accuracy, precision, recall and F-measure were used to evaluate the performance of the models. Each of the resulting models were further evaluated against manually annotated corpora of tweets to determine its performance and reliability. For the bilingual subjective classification model, performance was highest in Nave Bayes, using the combination of lexicon and corpora, at 95% objective-5% subjective imbalanced distribution, with F measure of 73.53%. Similarly, the bilingual sentiment classification model performed highest in Na?ve Bayes, using the combination of lexicon and corpora, at 95% positive-5% negative, with F measure of 72.41%. The study showed that for English-Filipino sentiments, bilingual classification works best with an imbalanced distribution scheme and combination of lexicon and corpora data sets. PCA was performed further on the resulting positive and negative sentiments to obtain manifest constructs on sentiments. Results showed a promising possibility of extending the bilingual sentiment classification model further to include specific positive and negative emotions. |
format |
text |
author |
MARLENE, DE LEON |
author_facet |
MARLENE, DE LEON |
author_sort |
MARLENE, DE LEON |
title |
Towards a bilingual sentiment analysis model for English and Filipino |
title_short |
Towards a bilingual sentiment analysis model for English and Filipino |
title_full |
Towards a bilingual sentiment analysis model for English and Filipino |
title_fullStr |
Towards a bilingual sentiment analysis model for English and Filipino |
title_full_unstemmed |
Towards a bilingual sentiment analysis model for English and Filipino |
title_sort |
towards a bilingual sentiment analysis model for english and filipino |
publisher |
Archīum Ateneo |
publishDate |
2013 |
url |
https://archium.ateneo.edu/theses-dissertations/226 http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=234946739&currentIndex=0&view=fullDetailsDetailsTab |
_version_ |
1712577819686469632 |