Bias problems in large language models and how to mitigate them

Pretrained Language Models (PLMs) like ChatGPT have become integral to various industries, revolutionising applications from customer service to software development. However, these PLMs are often trained on vast, unmoderated datasets, which may contain social biases that can be propagated in the m...

Full description

Saved in:
Bibliographic Details
Main Author: Ong, Adrian Zhi Ying
Other Authors: Luu Anh Tuan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181163
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-181163
record_format dspace
spelling sg-ntu-dr.10356-1811632024-11-18T01:16:32Z Bias problems in large language models and how to mitigate them Ong, Adrian Zhi Ying Luu Anh Tuan College of Computing and Data Science anhtuan.luu@ntu.edu.sg Computer and Information Science Bias Large language model Pretrained Language Models (PLMs) like ChatGPT have become integral to various industries, revolutionising applications from customer service to software development. However, these PLMs are often trained on vast, unmoderated datasets, which may contain social biases that can be propagated in the models' outputs. This study evaluates the effectiveness of five debiasing techniques: Self-Debias, Counterfactual Data Augmentation (CDA), SentenceDebias, Iterative Nullspace Projection (INLP), and Dropout regularization on three autoregressive large parameter models: GPT-2, Phi-2, and Llama-2. Focusing on bias categories: gender, race, and religion, specifically in U.S. and Singapore contexts, leveraging on established bias benchmarking datasets: CrowS-Pairs and StereoSet. The study found that Self-Debias is the most effective bias mitigation strategy, consistently reducing bias across all tested scenarios. However, with potential significant trade-offs in model performance in downstream tasks. Bias mitigation is more effective in the U.S. as compared to Singapore datasets primarily due to the scarcity of Singapore context training data used as training data. The study emphasizes the complexity of bias mitigation, highlighting the need for careful assessment in balancing the trade-off between bias reduction and model performance, as well as the importance of curating context-specific datasets. Finally, concluding with practical recommendations for future research and industry applications. Bachelor's degree 2024-11-18T01:15:59Z 2024-11-18T01:15:59Z 2024 Final Year Project (FYP) Ong, A. Z. Y. (2024). Bias problems in large language models and how to mitigate them. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181163 https://hdl.handle.net/10356/181163 en SCSE23-1077 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Bias
Large language model
spellingShingle Computer and Information Science
Bias
Large language model
Ong, Adrian Zhi Ying
Bias problems in large language models and how to mitigate them
description Pretrained Language Models (PLMs) like ChatGPT have become integral to various industries, revolutionising applications from customer service to software development. However, these PLMs are often trained on vast, unmoderated datasets, which may contain social biases that can be propagated in the models' outputs. This study evaluates the effectiveness of five debiasing techniques: Self-Debias, Counterfactual Data Augmentation (CDA), SentenceDebias, Iterative Nullspace Projection (INLP), and Dropout regularization on three autoregressive large parameter models: GPT-2, Phi-2, and Llama-2. Focusing on bias categories: gender, race, and religion, specifically in U.S. and Singapore contexts, leveraging on established bias benchmarking datasets: CrowS-Pairs and StereoSet. The study found that Self-Debias is the most effective bias mitigation strategy, consistently reducing bias across all tested scenarios. However, with potential significant trade-offs in model performance in downstream tasks. Bias mitigation is more effective in the U.S. as compared to Singapore datasets primarily due to the scarcity of Singapore context training data used as training data. The study emphasizes the complexity of bias mitigation, highlighting the need for careful assessment in balancing the trade-off between bias reduction and model performance, as well as the importance of curating context-specific datasets. Finally, concluding with practical recommendations for future research and industry applications.
author2 Luu Anh Tuan
author_facet Luu Anh Tuan
Ong, Adrian Zhi Ying
format Final Year Project
author Ong, Adrian Zhi Ying
author_sort Ong, Adrian Zhi Ying
title Bias problems in large language models and how to mitigate them
title_short Bias problems in large language models and how to mitigate them
title_full Bias problems in large language models and how to mitigate them
title_fullStr Bias problems in large language models and how to mitigate them
title_full_unstemmed Bias problems in large language models and how to mitigate them
title_sort bias problems in large language models and how to mitigate them
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/181163
_version_ 1816858994080219136