Bias in the age of generative AI: a deep dive into autoregressive model fairness
This study presents a comprehensive evaluation of biases in prominent autoregressive language models, including GPT-2, Llama-7B, and Mistral-7B. The research systematically assesses the models' performance across multiple dimensions of bias, including toxicity, gender, race, religion, and LG...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181069 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This study presents a comprehensive evaluation of biases in prominent autoregressive
language models, including GPT-2, Llama-7B, and Mistral-7B. The research systematically
assesses the models' performance across multiple dimensions of bias, including toxicity,
gender, race, religion, and LGBTQIA+ identities.
To evaluate toxicity, the study employs the RealToxicityPrompts dataset and the
"roberta-hate-speech-dynabench-r4" model. Gender, racial and religious biases are examined
using the BOLD dataset and the REGARD metric, while the HONEST benchmark is
leveraged to assess biases against LGBTQIA+ identities.
Notably, the research explores the effectiveness of structured prompts, particularly zero-shot
Chain-of-Thought (CoT)-based implication prompting, as a debiasing technique. The results
demonstrate the potential of this approach to mitigate biases across various domains, with
Llama-7B exhibiting the most consistent and substantial improvements. However, the study
also highlights the challenges in effectively debiasing LGBTQIA+ biases, underscoring the
need for more targeted and specialised techniques in this area.
Overall, this work provides a comprehensive understanding of the biases present in
contemporary autoregressive language models and offers insights into effective strategies for
bias mitigation, paving the way for the development of more equitable and inclusive AI
systems. |
---|