Bias problems in large language models and how to mitigate them
Pretrained Language Models (PLMs) like ChatGPT have become integral to various industries, revolutionising applications from customer service to software development. However, these PLMs are often trained on vast, unmoderated datasets, which may contain social biases that can be propagated in the m...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181163 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Pretrained Language Models (PLMs) like ChatGPT have become integral to various industries, revolutionising applications from customer service to software development.
However, these PLMs are often trained on vast, unmoderated datasets, which may contain social biases that can be propagated in the models' outputs.
This study evaluates the effectiveness of five debiasing techniques: Self-Debias, Counterfactual Data Augmentation (CDA), SentenceDebias, Iterative Nullspace Projection (INLP), and Dropout regularization on three autoregressive large parameter models: GPT-2, Phi-2, and Llama-2.
Focusing on bias categories: gender, race, and religion, specifically in U.S. and Singapore contexts, leveraging on established bias benchmarking datasets: CrowS-Pairs and StereoSet.
The study found that Self-Debias is the most effective bias mitigation strategy, consistently reducing bias across all tested scenarios.
However, with potential significant trade-offs in model performance in downstream tasks.
Bias mitigation is more effective in the U.S. as compared to Singapore datasets primarily due to the scarcity of Singapore context training data used as training data.
The study emphasizes the complexity of bias mitigation, highlighting the need for careful assessment in balancing the trade-off between bias reduction and model performance, as well as the importance of curating context-specific datasets.
Finally, concluding with practical recommendations for future research and industry applications. |
---|