Bias problems in large language models and how to mitigate them

Pretrained Language Models (PLMs) like ChatGPT have become integral to various industries, revolutionising applications from customer service to software development. However, these PLMs are often trained on vast, unmoderated datasets, which may contain social biases that can be propagated in the m...

Full description

Saved in:
Bibliographic Details
Main Author: Ong, Adrian Zhi Ying
Other Authors: Luu Anh Tuan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181163
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Pretrained Language Models (PLMs) like ChatGPT have become integral to various industries, revolutionising applications from customer service to software development. However, these PLMs are often trained on vast, unmoderated datasets, which may contain social biases that can be propagated in the models' outputs. This study evaluates the effectiveness of five debiasing techniques: Self-Debias, Counterfactual Data Augmentation (CDA), SentenceDebias, Iterative Nullspace Projection (INLP), and Dropout regularization on three autoregressive large parameter models: GPT-2, Phi-2, and Llama-2. Focusing on bias categories: gender, race, and religion, specifically in U.S. and Singapore contexts, leveraging on established bias benchmarking datasets: CrowS-Pairs and StereoSet. The study found that Self-Debias is the most effective bias mitigation strategy, consistently reducing bias across all tested scenarios. However, with potential significant trade-offs in model performance in downstream tasks. Bias mitigation is more effective in the U.S. as compared to Singapore datasets primarily due to the scarcity of Singapore context training data used as training data. The study emphasizes the complexity of bias mitigation, highlighting the need for careful assessment in balancing the trade-off between bias reduction and model performance, as well as the importance of curating context-specific datasets. Finally, concluding with practical recommendations for future research and industry applications.