Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models

Bilingual individuals already outnumber monolinguals yet most of the available resources for research in natural language processing (NLP) are for high-resource single languages. A recent area of interest in NLP research for low-resource languages is code-switching, a phenomenon in both written and...

Full description

Saved in:
Bibliographic Details
Main Authors: Cosme, Camilla Johnine, De Leon, Marlene M.
Format: text
Published: Archīum Ateneo 2024
Subjects:
Online Access:https://archium.ateneo.edu/discs-faculty-pubs/411
https://doi.org/10.1007/978-981-99-8349-0_11
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
id ph-ateneo-arc.discs-faculty-pubs-1411
record_format eprints
spelling ph-ateneo-arc.discs-faculty-pubs-14112024-04-15T08:20:18Z Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models Cosme, Camilla Johnine De Leon, Marlene M. Bilingual individuals already outnumber monolinguals yet most of the available resources for research in natural language processing (NLP) are for high-resource single languages. A recent area of interest in NLP research for low-resource languages is code-switching, a phenomenon in both written and spoken communication marked by the usage of at least two languages in one utterance. This work presented two novel contributions to NLP research for low-resource languages. First, it introduced the first sentiment-annotated corpus of Filipino-English Reviews with Code-Switching (FiReCS) with more than 10k instances of product and service reviews. Second, it developed sentiment analysis models for Filipino-English text using pre-trained Transformers-based large language models (LLMs) and introduced benchmark results for zero-shot sentiment analysis on text with code-switching using OpenAI’s GPT-3 series models. The performance of the Transformers-based sentiment analysis models were compared against those of existing lexicon-based sentiment analysis tools designed for monolingual text. The fine-tuned XLM-RoBERTa model achieved the highest accuracy and weighted average F1-score of 0.84 with F1-scores of 0.89, 0.86, and 0.78 in the Positive, Negative, and Neutral sentiment classes, respectively. The poor performance of the lexicon-based sentiment analysis tools exemplifies the limitations of such systems that are designed for a single language when applied to bilingual text involving code-switching. 2024-01-01T08:00:00Z text https://archium.ateneo.edu/discs-faculty-pubs/411 https://doi.org/10.1007/978-981-99-8349-0_11 Department of Information Systems & Computer Science Faculty Publications Archīum Ateneo Code-switching Natural language processing Online reviews Sentiment analysis Transformers Computer Sciences Databases and Information Systems Physical Sciences and Mathematics
institution Ateneo De Manila University
building Ateneo De Manila University Library
continent Asia
country Philippines
Philippines
content_provider Ateneo De Manila University Library
collection archium.Ateneo Institutional Repository
topic Code-switching
Natural language processing
Online reviews
Sentiment analysis
Transformers
Computer Sciences
Databases and Information Systems
Physical Sciences and Mathematics
spellingShingle Code-switching
Natural language processing
Online reviews
Sentiment analysis
Transformers
Computer Sciences
Databases and Information Systems
Physical Sciences and Mathematics
Cosme, Camilla Johnine
De Leon, Marlene M.
Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models
description Bilingual individuals already outnumber monolinguals yet most of the available resources for research in natural language processing (NLP) are for high-resource single languages. A recent area of interest in NLP research for low-resource languages is code-switching, a phenomenon in both written and spoken communication marked by the usage of at least two languages in one utterance. This work presented two novel contributions to NLP research for low-resource languages. First, it introduced the first sentiment-annotated corpus of Filipino-English Reviews with Code-Switching (FiReCS) with more than 10k instances of product and service reviews. Second, it developed sentiment analysis models for Filipino-English text using pre-trained Transformers-based large language models (LLMs) and introduced benchmark results for zero-shot sentiment analysis on text with code-switching using OpenAI’s GPT-3 series models. The performance of the Transformers-based sentiment analysis models were compared against those of existing lexicon-based sentiment analysis tools designed for monolingual text. The fine-tuned XLM-RoBERTa model achieved the highest accuracy and weighted average F1-score of 0.84 with F1-scores of 0.89, 0.86, and 0.78 in the Positive, Negative, and Neutral sentiment classes, respectively. The poor performance of the lexicon-based sentiment analysis tools exemplifies the limitations of such systems that are designed for a single language when applied to bilingual text involving code-switching.
format text
author Cosme, Camilla Johnine
De Leon, Marlene M.
author_facet Cosme, Camilla Johnine
De Leon, Marlene M.
author_sort Cosme, Camilla Johnine
title Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models
title_short Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models
title_full Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models
title_fullStr Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models
title_full_unstemmed Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models
title_sort sentiment analysis of code-switched filipino-english product and service reviews using transformers-based large language models
publisher Archīum Ateneo
publishDate 2024
url https://archium.ateneo.edu/discs-faculty-pubs/411
https://doi.org/10.1007/978-981-99-8349-0_11
_version_ 1797546527389908992