She elicits requirements and he tests: Software engineering gender bias in large language models

Implicit gender bias in software development is a well-documented issue, such as the association of technical roles with men. To address this bias, it is important to understand it in more detail. This study uses data mining techniques to investigate the extent to which 56 tasks related to software...

Full description

Saved in:
Bibliographic Details
Main Authors: TREUDE, Christoph, HATA, Hideaki
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8865
https://ink.library.smu.edu.sg/context/sis_research/article/9868/viewcontent/genderbias.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9868
record_format dspace
spelling sg-smu-ink.sis_research-98682024-06-13T09:10:36Z She elicits requirements and he tests: Software engineering gender bias in large language models TREUDE, Christoph HATA, Hideaki Implicit gender bias in software development is a well-documented issue, such as the association of technical roles with men. To address this bias, it is important to understand it in more detail. This study uses data mining techniques to investigate the extent to which 56 tasks related to software development, such as assigning GitHub issues and testing, are affected by implicit gender bias embedded in large language models. We systematically translated each task from English into a genderless language and back, and investigated the pronouns associated with each task. Based on translating each task 100 times in different permutations, we identify a significant disparity in the gendered pronoun associations with different tasks. Specifically, requirements elicitation was associated with the pronoun “he” in only 6% of cases, while testing was associated with “he” in 100% of cases. Additionally, tasks related to helping others had a 91% association with “he” while the same association for tasks related to asking coworkers was only 52%. These findings reveal a clear pattern of gender bias related to software development tasks and have important implications for addressing this issue both in the training of large language models and in broader society. 2023-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8865 info:doi/10.1109/MSR59073.2023.00088 https://ink.library.smu.edu.sg/context/sis_research/article/9868/viewcontent/genderbias.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University gender bias large language models software engineering Programming Languages and Compilers Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic gender bias
large language models
software engineering
Programming Languages and Compilers
Software Engineering
spellingShingle gender bias
large language models
software engineering
Programming Languages and Compilers
Software Engineering
TREUDE, Christoph
HATA, Hideaki
She elicits requirements and he tests: Software engineering gender bias in large language models
description Implicit gender bias in software development is a well-documented issue, such as the association of technical roles with men. To address this bias, it is important to understand it in more detail. This study uses data mining techniques to investigate the extent to which 56 tasks related to software development, such as assigning GitHub issues and testing, are affected by implicit gender bias embedded in large language models. We systematically translated each task from English into a genderless language and back, and investigated the pronouns associated with each task. Based on translating each task 100 times in different permutations, we identify a significant disparity in the gendered pronoun associations with different tasks. Specifically, requirements elicitation was associated with the pronoun “he” in only 6% of cases, while testing was associated with “he” in 100% of cases. Additionally, tasks related to helping others had a 91% association with “he” while the same association for tasks related to asking coworkers was only 52%. These findings reveal a clear pattern of gender bias related to software development tasks and have important implications for addressing this issue both in the training of large language models and in broader society.
format text
author TREUDE, Christoph
HATA, Hideaki
author_facet TREUDE, Christoph
HATA, Hideaki
author_sort TREUDE, Christoph
title She elicits requirements and he tests: Software engineering gender bias in large language models
title_short She elicits requirements and he tests: Software engineering gender bias in large language models
title_full She elicits requirements and he tests: Software engineering gender bias in large language models
title_fullStr She elicits requirements and he tests: Software engineering gender bias in large language models
title_full_unstemmed She elicits requirements and he tests: Software engineering gender bias in large language models
title_sort she elicits requirements and he tests: software engineering gender bias in large language models
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/8865
https://ink.library.smu.edu.sg/context/sis_research/article/9868/viewcontent/genderbias.pdf
_version_ 1814047600968466432