Clean-label backdoor attack and defense: an examination of language model vulnerability

Prompt-based learning, a paradigm that creates a bridge between pre-training and fine-tuning stages, has proven to be highly effective concerning various NLP tasks, particularly in few-shot scenarios. However, such a paradigm is not immune to backdoor attacks. Textual backdoor attacks aim at implant...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhao, Shuai, Xu, Xiaoyu, Xiao, Luwei, Wen, Jinming, Tuan, Luu Anh
Other Authors: College of Computing and Data Science
Format: Article
Language:English
Published: 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182201
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-182201
record_format dspace
spelling sg-ntu-dr.10356-1822012025-01-14T05:19:51Z Clean-label backdoor attack and defense: an examination of language model vulnerability Zhao, Shuai Xu, Xiaoyu Xiao, Luwei Wen, Jinming Tuan, Luu Anh College of Computing and Data Science Computer and Information Science Textual backdoor attack Large language model Prompt-based learning, a paradigm that creates a bridge between pre-training and fine-tuning stages, has proven to be highly effective concerning various NLP tasks, particularly in few-shot scenarios. However, such a paradigm is not immune to backdoor attacks. Textual backdoor attacks aim at implanting specific vulnerabilities into models by poisoning some of the training samples via the injection of triggers and the alteration of labels. This approach, though, has its drawbacks, such as unnatural language expressions due to the trigger and incorrect labeling of the poisoned samples. In this study, we introduce ProAttack, an innovative and efficient approach for executing clean-label backdoor attacks that employ the prompt as a trigger. Our approach eliminates the need for external triggers, and ensures correct labeling of poisoned samples, thereby enhancing the stealthy nature of the backdoor attack. Furthermore, we preliminarily explore defense strategies against clean-label backdoor attacks, utilizing the LoRA algorithm which involves minimal parameter updates. We execute comprehensive experiments in both rich-resource and few-shot settings across classification and radiology report summarization tasks. The results empirically validate the strong performance of ProAttack in the field of textual backdoor attacks. Remarkably, within the rich-resource settings for classification tasks, ProAttack outperforms other methods, achieving state-of-the-art attack success rates in the clean-label backdoor attack benchmark without utilizing external triggers. Additionally, the defense method effectively mitigates clean-label backdoor attacks while maintaining the performance of the model. Ministry of Education (MOE) This work was partially supported by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 (RS21/20), the National Natural Science Foundation of China (Nos. 12271215, 12326378, 12326377 and 11871248). 2025-01-14T05:19:51Z 2025-01-14T05:19:51Z 2025 Journal Article Zhao, S., Xu, X., Xiao, L., Wen, J. & Tuan, L. A. (2025). Clean-label backdoor attack and defense: an examination of language model vulnerability. Expert Systems With Applications, 265, 125856-. https://dx.doi.org/10.1016/j.eswa.2024.125856 0957-4174 https://hdl.handle.net/10356/182201 10.1016/j.eswa.2024.125856 2-s2.0-85211239141 265 125856 en RS21/20 Expert Systems with Applications © 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Textual backdoor attack
Large language model
spellingShingle Computer and Information Science
Textual backdoor attack
Large language model
Zhao, Shuai
Xu, Xiaoyu
Xiao, Luwei
Wen, Jinming
Tuan, Luu Anh
Clean-label backdoor attack and defense: an examination of language model vulnerability
description Prompt-based learning, a paradigm that creates a bridge between pre-training and fine-tuning stages, has proven to be highly effective concerning various NLP tasks, particularly in few-shot scenarios. However, such a paradigm is not immune to backdoor attacks. Textual backdoor attacks aim at implanting specific vulnerabilities into models by poisoning some of the training samples via the injection of triggers and the alteration of labels. This approach, though, has its drawbacks, such as unnatural language expressions due to the trigger and incorrect labeling of the poisoned samples. In this study, we introduce ProAttack, an innovative and efficient approach for executing clean-label backdoor attacks that employ the prompt as a trigger. Our approach eliminates the need for external triggers, and ensures correct labeling of poisoned samples, thereby enhancing the stealthy nature of the backdoor attack. Furthermore, we preliminarily explore defense strategies against clean-label backdoor attacks, utilizing the LoRA algorithm which involves minimal parameter updates. We execute comprehensive experiments in both rich-resource and few-shot settings across classification and radiology report summarization tasks. The results empirically validate the strong performance of ProAttack in the field of textual backdoor attacks. Remarkably, within the rich-resource settings for classification tasks, ProAttack outperforms other methods, achieving state-of-the-art attack success rates in the clean-label backdoor attack benchmark without utilizing external triggers. Additionally, the defense method effectively mitigates clean-label backdoor attacks while maintaining the performance of the model.
author2 College of Computing and Data Science
author_facet College of Computing and Data Science
Zhao, Shuai
Xu, Xiaoyu
Xiao, Luwei
Wen, Jinming
Tuan, Luu Anh
format Article
author Zhao, Shuai
Xu, Xiaoyu
Xiao, Luwei
Wen, Jinming
Tuan, Luu Anh
author_sort Zhao, Shuai
title Clean-label backdoor attack and defense: an examination of language model vulnerability
title_short Clean-label backdoor attack and defense: an examination of language model vulnerability
title_full Clean-label backdoor attack and defense: an examination of language model vulnerability
title_fullStr Clean-label backdoor attack and defense: an examination of language model vulnerability
title_full_unstemmed Clean-label backdoor attack and defense: an examination of language model vulnerability
title_sort clean-label backdoor attack and defense: an examination of language model vulnerability
publishDate 2025
url https://hdl.handle.net/10356/182201
_version_ 1821279351518265344