Bias in the age of generative AI: a deep dive into autoregressive model fairness

This study presents a comprehensive evaluation of biases in prominent autoregressive language models, including GPT-2, Llama-7B, and Mistral-7B. The research systematically assesses the models' performance across multiple dimensions of bias, including toxicity, gender, race, religion, and LG...

Full description

Saved in:
Bibliographic Details
Main Author: Ng, Darren Joon Kai
Other Authors: Luu Anh Tuan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181069
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-181069
record_format dspace
spelling sg-ntu-dr.10356-1810692024-11-13T06:50:53Z Bias in the age of generative AI: a deep dive into autoregressive model fairness Ng, Darren Joon Kai Luu Anh Tuan College of Computing and Data Science anhtuan.luu@ntu.edu.sg Computer and Information Science This study presents a comprehensive evaluation of biases in prominent autoregressive language models, including GPT-2, Llama-7B, and Mistral-7B. The research systematically assesses the models' performance across multiple dimensions of bias, including toxicity, gender, race, religion, and LGBTQIA+ identities. To evaluate toxicity, the study employs the RealToxicityPrompts dataset and the "roberta-hate-speech-dynabench-r4" model. Gender, racial and religious biases are examined using the BOLD dataset and the REGARD metric, while the HONEST benchmark is leveraged to assess biases against LGBTQIA+ identities. Notably, the research explores the effectiveness of structured prompts, particularly zero-shot Chain-of-Thought (CoT)-based implication prompting, as a debiasing technique. The results demonstrate the potential of this approach to mitigate biases across various domains, with Llama-7B exhibiting the most consistent and substantial improvements. However, the study also highlights the challenges in effectively debiasing LGBTQIA+ biases, underscoring the need for more targeted and specialised techniques in this area. Overall, this work provides a comprehensive understanding of the biases present in contemporary autoregressive language models and offers insights into effective strategies for bias mitigation, paving the way for the development of more equitable and inclusive AI systems. Bachelor's degree 2024-11-13T06:50:52Z 2024-11-13T06:50:52Z 2024 Final Year Project (FYP) Ng, D. J. K. (2024). Bias in the age of generative AI: a deep dive into autoregressive model fairness. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181069 https://hdl.handle.net/10356/181069 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Ng, Darren Joon Kai
Bias in the age of generative AI: a deep dive into autoregressive model fairness
description This study presents a comprehensive evaluation of biases in prominent autoregressive language models, including GPT-2, Llama-7B, and Mistral-7B. The research systematically assesses the models' performance across multiple dimensions of bias, including toxicity, gender, race, religion, and LGBTQIA+ identities. To evaluate toxicity, the study employs the RealToxicityPrompts dataset and the "roberta-hate-speech-dynabench-r4" model. Gender, racial and religious biases are examined using the BOLD dataset and the REGARD metric, while the HONEST benchmark is leveraged to assess biases against LGBTQIA+ identities. Notably, the research explores the effectiveness of structured prompts, particularly zero-shot Chain-of-Thought (CoT)-based implication prompting, as a debiasing technique. The results demonstrate the potential of this approach to mitigate biases across various domains, with Llama-7B exhibiting the most consistent and substantial improvements. However, the study also highlights the challenges in effectively debiasing LGBTQIA+ biases, underscoring the need for more targeted and specialised techniques in this area. Overall, this work provides a comprehensive understanding of the biases present in contemporary autoregressive language models and offers insights into effective strategies for bias mitigation, paving the way for the development of more equitable and inclusive AI systems.
author2 Luu Anh Tuan
author_facet Luu Anh Tuan
Ng, Darren Joon Kai
format Final Year Project
author Ng, Darren Joon Kai
author_sort Ng, Darren Joon Kai
title Bias in the age of generative AI: a deep dive into autoregressive model fairness
title_short Bias in the age of generative AI: a deep dive into autoregressive model fairness
title_full Bias in the age of generative AI: a deep dive into autoregressive model fairness
title_fullStr Bias in the age of generative AI: a deep dive into autoregressive model fairness
title_full_unstemmed Bias in the age of generative AI: a deep dive into autoregressive model fairness
title_sort bias in the age of generative ai: a deep dive into autoregressive model fairness
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/181069
_version_ 1816859016624603136