Bias in the age of generative AI: a deep dive into autoregressive model fairness
This study presents a comprehensive evaluation of biases in prominent autoregressive language models, including GPT-2, Llama-7B, and Mistral-7B. The research systematically assesses the models' performance across multiple dimensions of bias, including toxicity, gender, race, religion, and LG...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181069 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-181069 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1810692024-11-13T06:50:53Z Bias in the age of generative AI: a deep dive into autoregressive model fairness Ng, Darren Joon Kai Luu Anh Tuan College of Computing and Data Science anhtuan.luu@ntu.edu.sg Computer and Information Science This study presents a comprehensive evaluation of biases in prominent autoregressive language models, including GPT-2, Llama-7B, and Mistral-7B. The research systematically assesses the models' performance across multiple dimensions of bias, including toxicity, gender, race, religion, and LGBTQIA+ identities. To evaluate toxicity, the study employs the RealToxicityPrompts dataset and the "roberta-hate-speech-dynabench-r4" model. Gender, racial and religious biases are examined using the BOLD dataset and the REGARD metric, while the HONEST benchmark is leveraged to assess biases against LGBTQIA+ identities. Notably, the research explores the effectiveness of structured prompts, particularly zero-shot Chain-of-Thought (CoT)-based implication prompting, as a debiasing technique. The results demonstrate the potential of this approach to mitigate biases across various domains, with Llama-7B exhibiting the most consistent and substantial improvements. However, the study also highlights the challenges in effectively debiasing LGBTQIA+ biases, underscoring the need for more targeted and specialised techniques in this area. Overall, this work provides a comprehensive understanding of the biases present in contemporary autoregressive language models and offers insights into effective strategies for bias mitigation, paving the way for the development of more equitable and inclusive AI systems. Bachelor's degree 2024-11-13T06:50:52Z 2024-11-13T06:50:52Z 2024 Final Year Project (FYP) Ng, D. J. K. (2024). Bias in the age of generative AI: a deep dive into autoregressive model fairness. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181069 https://hdl.handle.net/10356/181069 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science |
spellingShingle |
Computer and Information Science Ng, Darren Joon Kai Bias in the age of generative AI: a deep dive into autoregressive model fairness |
description |
This study presents a comprehensive evaluation of biases in prominent autoregressive
language models, including GPT-2, Llama-7B, and Mistral-7B. The research systematically
assesses the models' performance across multiple dimensions of bias, including toxicity,
gender, race, religion, and LGBTQIA+ identities.
To evaluate toxicity, the study employs the RealToxicityPrompts dataset and the
"roberta-hate-speech-dynabench-r4" model. Gender, racial and religious biases are examined
using the BOLD dataset and the REGARD metric, while the HONEST benchmark is
leveraged to assess biases against LGBTQIA+ identities.
Notably, the research explores the effectiveness of structured prompts, particularly zero-shot
Chain-of-Thought (CoT)-based implication prompting, as a debiasing technique. The results
demonstrate the potential of this approach to mitigate biases across various domains, with
Llama-7B exhibiting the most consistent and substantial improvements. However, the study
also highlights the challenges in effectively debiasing LGBTQIA+ biases, underscoring the
need for more targeted and specialised techniques in this area.
Overall, this work provides a comprehensive understanding of the biases present in
contemporary autoregressive language models and offers insights into effective strategies for
bias mitigation, paving the way for the development of more equitable and inclusive AI
systems. |
author2 |
Luu Anh Tuan |
author_facet |
Luu Anh Tuan Ng, Darren Joon Kai |
format |
Final Year Project |
author |
Ng, Darren Joon Kai |
author_sort |
Ng, Darren Joon Kai |
title |
Bias in the age of generative AI: a deep dive into autoregressive model fairness |
title_short |
Bias in the age of generative AI: a deep dive into autoregressive model fairness |
title_full |
Bias in the age of generative AI: a deep dive into autoregressive model fairness |
title_fullStr |
Bias in the age of generative AI: a deep dive into autoregressive model fairness |
title_full_unstemmed |
Bias in the age of generative AI: a deep dive into autoregressive model fairness |
title_sort |
bias in the age of generative ai: a deep dive into autoregressive model fairness |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/181069 |
_version_ |
1816859016624603136 |