Bias in the age of generative AI: a deep dive into autoregressive model fairness

This study presents a comprehensive evaluation of biases in prominent autoregressive language models, including GPT-2, Llama-7B, and Mistral-7B. The research systematically assesses the models' performance across multiple dimensions of bias, including toxicity, gender, race, religion, and LG...

Full description

Saved in:
Bibliographic Details
Main Author: Ng, Darren Joon Kai
Other Authors: Luu Anh Tuan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181069
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This study presents a comprehensive evaluation of biases in prominent autoregressive language models, including GPT-2, Llama-7B, and Mistral-7B. The research systematically assesses the models' performance across multiple dimensions of bias, including toxicity, gender, race, religion, and LGBTQIA+ identities. To evaluate toxicity, the study employs the RealToxicityPrompts dataset and the "roberta-hate-speech-dynabench-r4" model. Gender, racial and religious biases are examined using the BOLD dataset and the REGARD metric, while the HONEST benchmark is leveraged to assess biases against LGBTQIA+ identities. Notably, the research explores the effectiveness of structured prompts, particularly zero-shot Chain-of-Thought (CoT)-based implication prompting, as a debiasing technique. The results demonstrate the potential of this approach to mitigate biases across various domains, with Llama-7B exhibiting the most consistent and substantial improvements. However, the study also highlights the challenges in effectively debiasing LGBTQIA+ biases, underscoring the need for more targeted and specialised techniques in this area. Overall, this work provides a comprehensive understanding of the biases present in contemporary autoregressive language models and offers insights into effective strategies for bias mitigation, paving the way for the development of more equitable and inclusive AI systems.