Characterizing Bias in Word Embeddings Towards Analyzing Gender Associations in Philippine Texts

The steady increase in computational gender bias research has been mostly done on languages for which reliable NLP packages are readily available - such as English, Chinese, and Spanish. This study expands on this area of research by using word embedding bias analysis methods in the Philippine conte...

Full description

Saved in:
Bibliographic Details
Main Authors: Gamboa, Lance Calvin, Estuar, Ma. Regina Justina
Format: text
Published: Archīum Ateneo 2023
Subjects:
Online Access:https://archium.ateneo.edu/discs-faculty-pubs/365
https://doi.org/10.1109/AIC57670.2023.10263949
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
id ph-ateneo-arc.discs-faculty-pubs-1365
record_format eprints
spelling ph-ateneo-arc.discs-faculty-pubs-13652024-02-21T05:17:46Z Characterizing Bias in Word Embeddings Towards Analyzing Gender Associations in Philippine Texts Gamboa, Lance Calvin Estuar, Ma. Regina Justina The steady increase in computational gender bias research has been mostly done on languages for which reliable NLP packages are readily available - such as English, Chinese, and Spanish. This study expands on this area of research by using word embedding bias analysis methods in the Philippine context. To this end, Philippine media textual corpora consisting of 380 million English words and 921 million Filipino words were compiled and used to train FastText embeddings. These embeddings were then subjected to validation and to the Word Embedding Association Test (WEAT) to characterize bias in the embeddings and in the texts they were trained in. Results show that Filipino texts are associated with the heterosexual male by default, but strongest biases relate to the female and the non-heterosexual. Meanwhile, media texts written in English generally have more balanced gender associations compared to texts written in Filipino. Furthermore, the Filipino corpus links action more to the male and objects and social roles to the female. On the other hand, implicitly gendered words in English texts are mostly nouns. These results contribute to demonstrations of how WEAT can be applied in low-resource languages, such as Filipino. 2023-01-01T08:00:00Z text https://archium.ateneo.edu/discs-faculty-pubs/365 https://doi.org/10.1109/AIC57670.2023.10263949 Department of Information Systems & Computer Science Faculty Publications Archīum Ateneo gender bias natural language processing Philippines sexism word embedding association Artificial Intelligence and Robotics Computer Engineering Electrical and Computer Engineering Engineering
institution Ateneo De Manila University
building Ateneo De Manila University Library
continent Asia
country Philippines
Philippines
content_provider Ateneo De Manila University Library
collection archium.Ateneo Institutional Repository
topic gender bias
natural language processing
Philippines
sexism
word embedding association
Artificial Intelligence and Robotics
Computer Engineering
Electrical and Computer Engineering
Engineering
spellingShingle gender bias
natural language processing
Philippines
sexism
word embedding association
Artificial Intelligence and Robotics
Computer Engineering
Electrical and Computer Engineering
Engineering
Gamboa, Lance Calvin
Estuar, Ma. Regina Justina
Characterizing Bias in Word Embeddings Towards Analyzing Gender Associations in Philippine Texts
description The steady increase in computational gender bias research has been mostly done on languages for which reliable NLP packages are readily available - such as English, Chinese, and Spanish. This study expands on this area of research by using word embedding bias analysis methods in the Philippine context. To this end, Philippine media textual corpora consisting of 380 million English words and 921 million Filipino words were compiled and used to train FastText embeddings. These embeddings were then subjected to validation and to the Word Embedding Association Test (WEAT) to characterize bias in the embeddings and in the texts they were trained in. Results show that Filipino texts are associated with the heterosexual male by default, but strongest biases relate to the female and the non-heterosexual. Meanwhile, media texts written in English generally have more balanced gender associations compared to texts written in Filipino. Furthermore, the Filipino corpus links action more to the male and objects and social roles to the female. On the other hand, implicitly gendered words in English texts are mostly nouns. These results contribute to demonstrations of how WEAT can be applied in low-resource languages, such as Filipino.
format text
author Gamboa, Lance Calvin
Estuar, Ma. Regina Justina
author_facet Gamboa, Lance Calvin
Estuar, Ma. Regina Justina
author_sort Gamboa, Lance Calvin
title Characterizing Bias in Word Embeddings Towards Analyzing Gender Associations in Philippine Texts
title_short Characterizing Bias in Word Embeddings Towards Analyzing Gender Associations in Philippine Texts
title_full Characterizing Bias in Word Embeddings Towards Analyzing Gender Associations in Philippine Texts
title_fullStr Characterizing Bias in Word Embeddings Towards Analyzing Gender Associations in Philippine Texts
title_full_unstemmed Characterizing Bias in Word Embeddings Towards Analyzing Gender Associations in Philippine Texts
title_sort characterizing bias in word embeddings towards analyzing gender associations in philippine texts
publisher Archīum Ateneo
publishDate 2023
url https://archium.ateneo.edu/discs-faculty-pubs/365
https://doi.org/10.1109/AIC57670.2023.10263949
_version_ 1792202609453432832