Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models
Bilingual individuals already outnumber monolinguals yet most of the available resources for research in natural language processing (NLP) are for high-resource single languages. A recent area of interest in NLP research for low-resource languages is code-switching, a phenomenon in both written and...
Saved in:
Main Authors: | , |
---|---|
Format: | text |
Published: |
Archīum Ateneo
2024
|
Subjects: | |
Online Access: | https://archium.ateneo.edu/discs-faculty-pubs/411 https://doi.org/10.1007/978-981-99-8349-0_11 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Ateneo De Manila University |
id |
ph-ateneo-arc.discs-faculty-pubs-1411 |
---|---|
record_format |
eprints |
spelling |
ph-ateneo-arc.discs-faculty-pubs-14112024-04-15T08:20:18Z Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models Cosme, Camilla Johnine De Leon, Marlene M. Bilingual individuals already outnumber monolinguals yet most of the available resources for research in natural language processing (NLP) are for high-resource single languages. A recent area of interest in NLP research for low-resource languages is code-switching, a phenomenon in both written and spoken communication marked by the usage of at least two languages in one utterance. This work presented two novel contributions to NLP research for low-resource languages. First, it introduced the first sentiment-annotated corpus of Filipino-English Reviews with Code-Switching (FiReCS) with more than 10k instances of product and service reviews. Second, it developed sentiment analysis models for Filipino-English text using pre-trained Transformers-based large language models (LLMs) and introduced benchmark results for zero-shot sentiment analysis on text with code-switching using OpenAI’s GPT-3 series models. The performance of the Transformers-based sentiment analysis models were compared against those of existing lexicon-based sentiment analysis tools designed for monolingual text. The fine-tuned XLM-RoBERTa model achieved the highest accuracy and weighted average F1-score of 0.84 with F1-scores of 0.89, 0.86, and 0.78 in the Positive, Negative, and Neutral sentiment classes, respectively. The poor performance of the lexicon-based sentiment analysis tools exemplifies the limitations of such systems that are designed for a single language when applied to bilingual text involving code-switching. 2024-01-01T08:00:00Z text https://archium.ateneo.edu/discs-faculty-pubs/411 https://doi.org/10.1007/978-981-99-8349-0_11 Department of Information Systems & Computer Science Faculty Publications Archīum Ateneo Code-switching Natural language processing Online reviews Sentiment analysis Transformers Computer Sciences Databases and Information Systems Physical Sciences and Mathematics |
institution |
Ateneo De Manila University |
building |
Ateneo De Manila University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
Ateneo De Manila University Library |
collection |
archium.Ateneo Institutional Repository |
topic |
Code-switching Natural language processing Online reviews Sentiment analysis Transformers Computer Sciences Databases and Information Systems Physical Sciences and Mathematics |
spellingShingle |
Code-switching Natural language processing Online reviews Sentiment analysis Transformers Computer Sciences Databases and Information Systems Physical Sciences and Mathematics Cosme, Camilla Johnine De Leon, Marlene M. Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models |
description |
Bilingual individuals already outnumber monolinguals yet most of the available resources for research in natural language processing (NLP) are for high-resource single languages. A recent area of interest in NLP research for low-resource languages is code-switching, a phenomenon in both written and spoken communication marked by the usage of at least two languages in one utterance. This work presented two novel contributions to NLP research for low-resource languages. First, it introduced the first sentiment-annotated corpus of Filipino-English Reviews with Code-Switching (FiReCS) with more than 10k instances of product and service reviews. Second, it developed sentiment analysis models for Filipino-English text using pre-trained Transformers-based large language models (LLMs) and introduced benchmark results for zero-shot sentiment analysis on text with code-switching using OpenAI’s GPT-3 series models. The performance of the Transformers-based sentiment analysis models were compared against those of existing lexicon-based sentiment analysis tools designed for monolingual text. The fine-tuned XLM-RoBERTa model achieved the highest accuracy and weighted average F1-score of 0.84 with F1-scores of 0.89, 0.86, and 0.78 in the Positive, Negative, and Neutral sentiment classes, respectively. The poor performance of the lexicon-based sentiment analysis tools exemplifies the limitations of such systems that are designed for a single language when applied to bilingual text involving code-switching. |
format |
text |
author |
Cosme, Camilla Johnine De Leon, Marlene M. |
author_facet |
Cosme, Camilla Johnine De Leon, Marlene M. |
author_sort |
Cosme, Camilla Johnine |
title |
Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models |
title_short |
Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models |
title_full |
Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models |
title_fullStr |
Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models |
title_full_unstemmed |
Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models |
title_sort |
sentiment analysis of code-switched filipino-english product and service reviews using transformers-based large language models |
publisher |
Archīum Ateneo |
publishDate |
2024 |
url |
https://archium.ateneo.edu/discs-faculty-pubs/411 https://doi.org/10.1007/978-981-99-8349-0_11 |
_version_ |
1797546527389908992 |