Development of Bilingual Sentiment and Emotion Text Classification Models from COVID-19 Vaccination Tweets in the Philippines

Social media can be used to understand how the public is responding to the ongoing nationwide COVID-19 vaccination campaign, allowing policymakers to respond effectively through informed decisions. However, conducting social media analysis in the Philippine-context presents a challenge because natur...

Full description

Saved in:
Bibliographic Details
Main Authors: Co, Nicole Allison S, Estuar, Ma. Regina Justina, Tan, Hans Calvin L, Tan, Austin Sebastien, Abao, Roland P, Aureus, Jelly P
Format: text
Published: Archīum Ateneo 2022
Subjects:
NLP
Online Access:https://archium.ateneo.edu/discs-faculty-pubs/339
https://doi.org/10.1007/978-3-031-05061-9_18
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
Description
Summary:Social media can be used to understand how the public is responding to the ongoing nationwide COVID-19 vaccination campaign, allowing policymakers to respond effectively through informed decisions. However, conducting social media analysis in the Philippine-context presents a challenge because natural informal conversations make use of a combination of English and local language. This study addresses this challenge by including part-of-speech tags, frequency of code switching and language dominance features to represent bilingualism in training machine learning models with COVID-19 vaccination-related Tweets for sentiment and emotion analysis. Results showed that the English-Tagalog Logistic Regression sentiment classification model performed better than Textblob, VADER and Polyglot with an accuracy of 70.36%. Similarly, the English-Tagalog SVM emotion classification model performed better than Text2emotion, NRC Affect Intensity Lexicon and EmoTFIDF with an average mean-squared error of 0.049. The added bilingual features only improved these performance metrics by a small margin. Nevertheless, SHAP analysis still revealed that sentiment and emotion classes exhibit varying levels of these bilingual features, which shows the potential in exploring similar linguistic features to distinguish between classes better during text classification for future studies. Finally, Tweets from September 2021 to January 2022 shows negative, mainly anger and sadness, perceptions towards COVID-19 vaccinations.