Clean-label backdoor attack and defense: an examination of language model vulnerability
Prompt-based learning, a paradigm that creates a bridge between pre-training and fine-tuning stages, has proven to be highly effective concerning various NLP tasks, particularly in few-shot scenarios. However, such a paradigm is not immune to backdoor attacks. Textual backdoor attacks aim at implant...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/182201 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-182201 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1822012025-01-14T05:19:51Z Clean-label backdoor attack and defense: an examination of language model vulnerability Zhao, Shuai Xu, Xiaoyu Xiao, Luwei Wen, Jinming Tuan, Luu Anh College of Computing and Data Science Computer and Information Science Textual backdoor attack Large language model Prompt-based learning, a paradigm that creates a bridge between pre-training and fine-tuning stages, has proven to be highly effective concerning various NLP tasks, particularly in few-shot scenarios. However, such a paradigm is not immune to backdoor attacks. Textual backdoor attacks aim at implanting specific vulnerabilities into models by poisoning some of the training samples via the injection of triggers and the alteration of labels. This approach, though, has its drawbacks, such as unnatural language expressions due to the trigger and incorrect labeling of the poisoned samples. In this study, we introduce ProAttack, an innovative and efficient approach for executing clean-label backdoor attacks that employ the prompt as a trigger. Our approach eliminates the need for external triggers, and ensures correct labeling of poisoned samples, thereby enhancing the stealthy nature of the backdoor attack. Furthermore, we preliminarily explore defense strategies against clean-label backdoor attacks, utilizing the LoRA algorithm which involves minimal parameter updates. We execute comprehensive experiments in both rich-resource and few-shot settings across classification and radiology report summarization tasks. The results empirically validate the strong performance of ProAttack in the field of textual backdoor attacks. Remarkably, within the rich-resource settings for classification tasks, ProAttack outperforms other methods, achieving state-of-the-art attack success rates in the clean-label backdoor attack benchmark without utilizing external triggers. Additionally, the defense method effectively mitigates clean-label backdoor attacks while maintaining the performance of the model. Ministry of Education (MOE) This work was partially supported by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 (RS21/20), the National Natural Science Foundation of China (Nos. 12271215, 12326378, 12326377 and 11871248). 2025-01-14T05:19:51Z 2025-01-14T05:19:51Z 2025 Journal Article Zhao, S., Xu, X., Xiao, L., Wen, J. & Tuan, L. A. (2025). Clean-label backdoor attack and defense: an examination of language model vulnerability. Expert Systems With Applications, 265, 125856-. https://dx.doi.org/10.1016/j.eswa.2024.125856 0957-4174 https://hdl.handle.net/10356/182201 10.1016/j.eswa.2024.125856 2-s2.0-85211239141 265 125856 en RS21/20 Expert Systems with Applications © 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Textual backdoor attack Large language model |
spellingShingle |
Computer and Information Science Textual backdoor attack Large language model Zhao, Shuai Xu, Xiaoyu Xiao, Luwei Wen, Jinming Tuan, Luu Anh Clean-label backdoor attack and defense: an examination of language model vulnerability |
description |
Prompt-based learning, a paradigm that creates a bridge between pre-training and fine-tuning stages, has proven to be highly effective concerning various NLP tasks, particularly in few-shot scenarios. However, such a paradigm is not immune to backdoor attacks. Textual backdoor attacks aim at implanting specific vulnerabilities into models by poisoning some of the training samples via the injection of triggers and the alteration of labels. This approach, though, has its drawbacks, such as unnatural language expressions due to the trigger and incorrect labeling of the poisoned samples. In this study, we introduce ProAttack, an innovative and efficient approach for executing clean-label backdoor attacks that employ the prompt as a trigger. Our approach eliminates the need for external triggers, and ensures correct labeling of poisoned samples, thereby enhancing the stealthy nature of the backdoor attack. Furthermore, we preliminarily explore defense strategies against clean-label backdoor attacks, utilizing the LoRA algorithm which involves minimal parameter updates. We execute comprehensive experiments in both rich-resource and few-shot settings across classification and radiology report summarization tasks. The results empirically validate the strong performance of ProAttack in the field of textual backdoor attacks. Remarkably, within the rich-resource settings for classification tasks, ProAttack outperforms other methods, achieving state-of-the-art attack success rates in the clean-label backdoor attack benchmark without utilizing external triggers. Additionally, the defense method effectively mitigates clean-label backdoor attacks while maintaining the performance of the model. |
author2 |
College of Computing and Data Science |
author_facet |
College of Computing and Data Science Zhao, Shuai Xu, Xiaoyu Xiao, Luwei Wen, Jinming Tuan, Luu Anh |
format |
Article |
author |
Zhao, Shuai Xu, Xiaoyu Xiao, Luwei Wen, Jinming Tuan, Luu Anh |
author_sort |
Zhao, Shuai |
title |
Clean-label backdoor attack and defense: an examination of language model vulnerability |
title_short |
Clean-label backdoor attack and defense: an examination of language model vulnerability |
title_full |
Clean-label backdoor attack and defense: an examination of language model vulnerability |
title_fullStr |
Clean-label backdoor attack and defense: an examination of language model vulnerability |
title_full_unstemmed |
Clean-label backdoor attack and defense: an examination of language model vulnerability |
title_sort |
clean-label backdoor attack and defense: an examination of language model vulnerability |
publishDate |
2025 |
url |
https://hdl.handle.net/10356/182201 |
_version_ |
1821279351518265344 |