Natural attack for pre-trained models of code

Pre-trained models of code have achieved success in many important software engineering tasks. However, these powerful models are vulnerable to adversarial attacks that slightly perturb model inputs to make a victim model produce wrong outputs. Current works mainly attack models of code with example...

Full description

Saved in:

Bibliographic Details
Main Authors:	YANG, Zhou, SHI, Jieke, HE, Junda, LO, David
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Genetic Algorithm Adversarial Attack Pre-Trained Models Databases and Information Systems Information Security
Online Access:	https://ink.library.smu.edu.sg/sis_research/7654 https://ink.library.smu.edu.sg/context/sis_research/article/8657/viewcontent/Natural.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8657
record_format	dspace
spelling	sg-smu-ink.sis_research-86572023-01-10T03:47:33Z Natural attack for pre-trained models of code YANG, Zhou SHI, Jieke HE, Junda LO, David Pre-trained models of code have achieved success in many important software engineering tasks. However, these powerful models are vulnerable to adversarial attacks that slightly perturb model inputs to make a victim model produce wrong outputs. Current works mainly attack models of code with examples that preserve operational program semantics but ignore a fundamental requirement for adversarial example generation: perturbations should be natural to human judges, which we refer to as naturalness requirement. In this paper, we propose ALERT (Naturalness Aware Attack), a black-box attack that adversarially transforms inputs to make victim models produce wrong outputs. Different from prior works, this paper considers the natural semantic of generated examples at the same time as preserving the operational semantic of original inputs. Our user study demonstrates that human developers consistently consider that adversarial examples generated by ALERT are more natural than those generated by the state-of-the-art work by Zhang et al. that ignores the naturalness requirement. On attacking CodeBERT, our approach can achieve attack success rates of 53.62%, 27.79%, and 35.78% across three downstream tasks: vulnerability prediction, clone detection and code authorship attribution. On GraphCodeBERT, our approach can achieve average success rates of 76.95%, 7.96% and 61.47% on the three tasks. The above outperforms the baseline by 14.07% and 18.56% on the two pretrained models on average. Finally, we investigated the value of the generated adversarial examples to harden victim models through an adversarial fine-tuning procedure and demonstrated the accuracy of CodeBERT and GraphCodeBERT against ALERT-generated adversarial examples increased by 87.59% and 92.32%, respectively 2022-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7654 info:doi/10.1145/3510003.3510146 https://ink.library.smu.edu.sg/context/sis_research/article/8657/viewcontent/Natural.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Genetic Algorithm Adversarial Attack Pre-Trained Models Databases and Information Systems Information Security
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Genetic Algorithm Adversarial Attack Pre-Trained Models Databases and Information Systems Information Security
spellingShingle	Genetic Algorithm Adversarial Attack Pre-Trained Models Databases and Information Systems Information Security YANG, Zhou SHI, Jieke HE, Junda LO, David Natural attack for pre-trained models of code
description	Pre-trained models of code have achieved success in many important software engineering tasks. However, these powerful models are vulnerable to adversarial attacks that slightly perturb model inputs to make a victim model produce wrong outputs. Current works mainly attack models of code with examples that preserve operational program semantics but ignore a fundamental requirement for adversarial example generation: perturbations should be natural to human judges, which we refer to as naturalness requirement. In this paper, we propose ALERT (Naturalness Aware Attack), a black-box attack that adversarially transforms inputs to make victim models produce wrong outputs. Different from prior works, this paper considers the natural semantic of generated examples at the same time as preserving the operational semantic of original inputs. Our user study demonstrates that human developers consistently consider that adversarial examples generated by ALERT are more natural than those generated by the state-of-the-art work by Zhang et al. that ignores the naturalness requirement. On attacking CodeBERT, our approach can achieve attack success rates of 53.62%, 27.79%, and 35.78% across three downstream tasks: vulnerability prediction, clone detection and code authorship attribution. On GraphCodeBERT, our approach can achieve average success rates of 76.95%, 7.96% and 61.47% on the three tasks. The above outperforms the baseline by 14.07% and 18.56% on the two pretrained models on average. Finally, we investigated the value of the generated adversarial examples to harden victim models through an adversarial fine-tuning procedure and demonstrated the accuracy of CodeBERT and GraphCodeBERT against ALERT-generated adversarial examples increased by 87.59% and 92.32%, respectively
format	text
author	YANG, Zhou SHI, Jieke HE, Junda LO, David
author_facet	YANG, Zhou SHI, Jieke HE, Junda LO, David
author_sort	YANG, Zhou
title	Natural attack for pre-trained models of code
title_short	Natural attack for pre-trained models of code
title_full	Natural attack for pre-trained models of code
title_fullStr	Natural attack for pre-trained models of code
title_full_unstemmed	Natural attack for pre-trained models of code
title_sort	natural attack for pre-trained models of code
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/7654 https://ink.library.smu.edu.sg/context/sis_research/article/8657/viewcontent/Natural.pdf
_version_	1770576409500057600

Natural attack for pre-trained models of code

Similar Items