Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models

Bilingual individuals already outnumber monolinguals yet most of the available resources for research in natural language processing (NLP) are for high-resource single languages. A recent area of interest in NLP research for low-resource languages is code-switching, a phenomenon in both written and...

Full description

Saved in:

Bibliographic Details
Main Authors:	Cosme, Camilla Johnine, De Leon, Marlene M.
Format:	text
Published:	Archīum Ateneo 2024
Subjects:	Code-switching Natural language processing Online reviews Sentiment analysis Transformers Computer Sciences Databases and Information Systems Physical Sciences and Mathematics
Online Access:	https://archium.ateneo.edu/discs-faculty-pubs/411 https://doi.org/10.1007/978-981-99-8349-0_11
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Ateneo De Manila University

Description
Summary:	Bilingual individuals already outnumber monolinguals yet most of the available resources for research in natural language processing (NLP) are for high-resource single languages. A recent area of interest in NLP research for low-resource languages is code-switching, a phenomenon in both written and spoken communication marked by the usage of at least two languages in one utterance. This work presented two novel contributions to NLP research for low-resource languages. First, it introduced the first sentiment-annotated corpus of Filipino-English Reviews with Code-Switching (FiReCS) with more than 10k instances of product and service reviews. Second, it developed sentiment analysis models for Filipino-English text using pre-trained Transformers-based large language models (LLMs) and introduced benchmark results for zero-shot sentiment analysis on text with code-switching using OpenAI’s GPT-3 series models. The performance of the Transformers-based sentiment analysis models were compared against those of existing lexicon-based sentiment analysis tools designed for monolingual text. The fine-tuned XLM-RoBERTa model achieved the highest accuracy and weighted average F1-score of 0.84 with F1-scores of 0.89, 0.86, and 0.78 in the Positive, Negative, and Neutral sentiment classes, respectively. The poor performance of the lexicon-based sentiment analysis tools exemplifies the limitations of such systems that are designed for a single language when applied to bilingual text involving code-switching.

Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models

Similar Items