Bias problems in large language models and how to mitigate them
Pretrained Language Models (PLMs) like ChatGPT have become integral to various industries, revolutionising applications from customer service to software development. However, these PLMs are often trained on vast, unmoderated datasets, which may contain social biases that can be propagated in the m...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181163 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-181163 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1811632024-11-18T01:16:32Z Bias problems in large language models and how to mitigate them Ong, Adrian Zhi Ying Luu Anh Tuan College of Computing and Data Science anhtuan.luu@ntu.edu.sg Computer and Information Science Bias Large language model Pretrained Language Models (PLMs) like ChatGPT have become integral to various industries, revolutionising applications from customer service to software development. However, these PLMs are often trained on vast, unmoderated datasets, which may contain social biases that can be propagated in the models' outputs. This study evaluates the effectiveness of five debiasing techniques: Self-Debias, Counterfactual Data Augmentation (CDA), SentenceDebias, Iterative Nullspace Projection (INLP), and Dropout regularization on three autoregressive large parameter models: GPT-2, Phi-2, and Llama-2. Focusing on bias categories: gender, race, and religion, specifically in U.S. and Singapore contexts, leveraging on established bias benchmarking datasets: CrowS-Pairs and StereoSet. The study found that Self-Debias is the most effective bias mitigation strategy, consistently reducing bias across all tested scenarios. However, with potential significant trade-offs in model performance in downstream tasks. Bias mitigation is more effective in the U.S. as compared to Singapore datasets primarily due to the scarcity of Singapore context training data used as training data. The study emphasizes the complexity of bias mitigation, highlighting the need for careful assessment in balancing the trade-off between bias reduction and model performance, as well as the importance of curating context-specific datasets. Finally, concluding with practical recommendations for future research and industry applications. Bachelor's degree 2024-11-18T01:15:59Z 2024-11-18T01:15:59Z 2024 Final Year Project (FYP) Ong, A. Z. Y. (2024). Bias problems in large language models and how to mitigate them. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181163 https://hdl.handle.net/10356/181163 en SCSE23-1077 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Bias Large language model |
spellingShingle |
Computer and Information Science Bias Large language model Ong, Adrian Zhi Ying Bias problems in large language models and how to mitigate them |
description |
Pretrained Language Models (PLMs) like ChatGPT have become integral to various industries, revolutionising applications from customer service to software development.
However, these PLMs are often trained on vast, unmoderated datasets, which may contain social biases that can be propagated in the models' outputs.
This study evaluates the effectiveness of five debiasing techniques: Self-Debias, Counterfactual Data Augmentation (CDA), SentenceDebias, Iterative Nullspace Projection (INLP), and Dropout regularization on three autoregressive large parameter models: GPT-2, Phi-2, and Llama-2.
Focusing on bias categories: gender, race, and religion, specifically in U.S. and Singapore contexts, leveraging on established bias benchmarking datasets: CrowS-Pairs and StereoSet.
The study found that Self-Debias is the most effective bias mitigation strategy, consistently reducing bias across all tested scenarios.
However, with potential significant trade-offs in model performance in downstream tasks.
Bias mitigation is more effective in the U.S. as compared to Singapore datasets primarily due to the scarcity of Singapore context training data used as training data.
The study emphasizes the complexity of bias mitigation, highlighting the need for careful assessment in balancing the trade-off between bias reduction and model performance, as well as the importance of curating context-specific datasets.
Finally, concluding with practical recommendations for future research and industry applications. |
author2 |
Luu Anh Tuan |
author_facet |
Luu Anh Tuan Ong, Adrian Zhi Ying |
format |
Final Year Project |
author |
Ong, Adrian Zhi Ying |
author_sort |
Ong, Adrian Zhi Ying |
title |
Bias problems in large language models and how to mitigate them |
title_short |
Bias problems in large language models and how to mitigate them |
title_full |
Bias problems in large language models and how to mitigate them |
title_fullStr |
Bias problems in large language models and how to mitigate them |
title_full_unstemmed |
Bias problems in large language models and how to mitigate them |
title_sort |
bias problems in large language models and how to mitigate them |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/181163 |
_version_ |
1816858994080219136 |