Clean-label backdoor attack and defense: an examination of language model vulnerability

Prompt-based learning, a paradigm that creates a bridge between pre-training and fine-tuning stages, has proven to be highly effective concerning various NLP tasks, particularly in few-shot scenarios. However, such a paradigm is not immune to backdoor attacks. Textual backdoor attacks aim at implant...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhao, Shuai, Xu, Xiaoyu, Xiao, Luwei, Wen, Jinming, Tuan, Luu Anh
Other Authors:	College of Computing and Data Science
Format:	Article
Language:	English
Published:	2025
Subjects:	Computer and Information Science Textual backdoor attack Large language model
Online Access:	https://hdl.handle.net/10356/182201
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-182201
record_format	dspace
spelling	sg-ntu-dr.10356-1822012025-01-14T05:19:51Z Clean-label backdoor attack and defense: an examination of language model vulnerability Zhao, Shuai Xu, Xiaoyu Xiao, Luwei Wen, Jinming Tuan, Luu Anh College of Computing and Data Science Computer and Information Science Textual backdoor attack Large language model Prompt-based learning, a paradigm that creates a bridge between pre-training and fine-tuning stages, has proven to be highly effective concerning various NLP tasks, particularly in few-shot scenarios. However, such a paradigm is not immune to backdoor attacks. Textual backdoor attacks aim at implanting specific vulnerabilities into models by poisoning some of the training samples via the injection of triggers and the alteration of labels. This approach, though, has its drawbacks, such as unnatural language expressions due to the trigger and incorrect labeling of the poisoned samples. In this study, we introduce ProAttack, an innovative and efficient approach for executing clean-label backdoor attacks that employ the prompt as a trigger. Our approach eliminates the need for external triggers, and ensures correct labeling of poisoned samples, thereby enhancing the stealthy nature of the backdoor attack. Furthermore, we preliminarily explore defense strategies against clean-label backdoor attacks, utilizing the LoRA algorithm which involves minimal parameter updates. We execute comprehensive experiments in both rich-resource and few-shot settings across classification and radiology report summarization tasks. The results empirically validate the strong performance of ProAttack in the field of textual backdoor attacks. Remarkably, within the rich-resource settings for classification tasks, ProAttack outperforms other methods, achieving state-of-the-art attack success rates in the clean-label backdoor attack benchmark without utilizing external triggers. Additionally, the defense method effectively mitigates clean-label backdoor attacks while maintaining the performance of the model. Ministry of Education (MOE) This work was partially supported by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 (RS21/20), the National Natural Science Foundation of China (Nos. 12271215, 12326378, 12326377 and 11871248). 2025-01-14T05:19:51Z 2025-01-14T05:19:51Z 2025 Journal Article Zhao, S., Xu, X., Xiao, L., Wen, J. & Tuan, L. A. (2025). Clean-label backdoor attack and defense: an examination of language model vulnerability. Expert Systems With Applications, 265, 125856-. https://dx.doi.org/10.1016/j.eswa.2024.125856 0957-4174 https://hdl.handle.net/10356/182201 10.1016/j.eswa.2024.125856 2-s2.0-85211239141 265 125856 en RS21/20 Expert Systems with Applications © 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Textual backdoor attack Large language model
spellingShingle	Computer and Information Science Textual backdoor attack Large language model Zhao, Shuai Xu, Xiaoyu Xiao, Luwei Wen, Jinming Tuan, Luu Anh Clean-label backdoor attack and defense: an examination of language model vulnerability
description	Prompt-based learning, a paradigm that creates a bridge between pre-training and fine-tuning stages, has proven to be highly effective concerning various NLP tasks, particularly in few-shot scenarios. However, such a paradigm is not immune to backdoor attacks. Textual backdoor attacks aim at implanting specific vulnerabilities into models by poisoning some of the training samples via the injection of triggers and the alteration of labels. This approach, though, has its drawbacks, such as unnatural language expressions due to the trigger and incorrect labeling of the poisoned samples. In this study, we introduce ProAttack, an innovative and efficient approach for executing clean-label backdoor attacks that employ the prompt as a trigger. Our approach eliminates the need for external triggers, and ensures correct labeling of poisoned samples, thereby enhancing the stealthy nature of the backdoor attack. Furthermore, we preliminarily explore defense strategies against clean-label backdoor attacks, utilizing the LoRA algorithm which involves minimal parameter updates. We execute comprehensive experiments in both rich-resource and few-shot settings across classification and radiology report summarization tasks. The results empirically validate the strong performance of ProAttack in the field of textual backdoor attacks. Remarkably, within the rich-resource settings for classification tasks, ProAttack outperforms other methods, achieving state-of-the-art attack success rates in the clean-label backdoor attack benchmark without utilizing external triggers. Additionally, the defense method effectively mitigates clean-label backdoor attacks while maintaining the performance of the model.
author2	College of Computing and Data Science
author_facet	College of Computing and Data Science Zhao, Shuai Xu, Xiaoyu Xiao, Luwei Wen, Jinming Tuan, Luu Anh
format	Article
author	Zhao, Shuai Xu, Xiaoyu Xiao, Luwei Wen, Jinming Tuan, Luu Anh
author_sort	Zhao, Shuai
title	Clean-label backdoor attack and defense: an examination of language model vulnerability
title_short	Clean-label backdoor attack and defense: an examination of language model vulnerability
title_full	Clean-label backdoor attack and defense: an examination of language model vulnerability
title_fullStr	Clean-label backdoor attack and defense: an examination of language model vulnerability
title_full_unstemmed	Clean-label backdoor attack and defense: an examination of language model vulnerability
title_sort	clean-label backdoor attack and defense: an examination of language model vulnerability
publishDate	2025
url	https://hdl.handle.net/10356/182201
_version_	1821279351518265344

Clean-label backdoor attack and defense: an examination of language model vulnerability

Similar Items