SampDetox : Black-box backdoor defense via perturbation-based sample detoxification

The advancement of Machine Learning has enabled the widespread deployment of Machine Learning as a Service (MLaaS) applications. However, the untrustworthy nature of third-party ML services poses backdoor threats. Existing defenses in MLaaS are limited by their reliance on training samples or white-...

Full description

Saved in:
Bibliographic Details
Main Authors: YANG, Yanxin, JIA, Chentao, YAN, Dengke, HU, Ming, LI, Tianlin, XIE, Xiaofei, WEI, Xian, CHEN, Mingsong
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9812
https://ink.library.smu.edu.sg/context/sis_research/article/10812/viewcontent/8771_SampDetox_Black_box_Backd.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10812
record_format dspace
spelling sg-smu-ink.sis_research-108122024-12-24T03:47:08Z SampDetox : Black-box backdoor defense via perturbation-based sample detoxification YANG, Yanxin JIA, Chentao YAN, Dengke HU, Ming LI, Tianlin XIE, Xiaofei WEI, Xian CHEN, Mingsong The advancement of Machine Learning has enabled the widespread deployment of Machine Learning as a Service (MLaaS) applications. However, the untrustworthy nature of third-party ML services poses backdoor threats. Existing defenses in MLaaS are limited by their reliance on training samples or white-box model analysis, highlighting the need for a black-box backdoor purification method. In our paper, we attempt to use diffusion models for purification by introducing noise in a forward diffusion process to destroy backdoors and recover clean samples through a reverse generative process. However, since a higher noise also destroys the semantics of the original samples, it still results in a low restoration performance. To investigate the effectiveness of noise in eliminating different types of backdoors, we conducted a preliminary study, which demonstrates that backdoors with low visibility can be easily destroyed by lightweight noise and those with high visibility need to be destroyed by high noise but can be easily detected. Based on the study, we propose SampDetox, which strategically combines lightweight and high noise. SampDetox applies weak noise to eliminate low-visibility backdoors and compares the structural similarity between the recovered and original samples to localize high-visibility backdoors. Intensive noise is then applied to these localized areas, destroying the high-visibility backdoors while preserving global semantic information. As a result, detoxified samples can be used for inference, even by poisoned models. Comprehensive experiments demonstrate the effectiveness of SampDetox in defending against various state-of-the-art backdoor attacks. 2024-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9812 https://ink.library.smu.edu.sg/context/sis_research/article/10812/viewcontent/8771_SampDetox_Black_box_Backd.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Machine learning Backdoor threats Backdoor defense Information Security
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Machine learning
Backdoor threats
Backdoor defense
Information Security
spellingShingle Machine learning
Backdoor threats
Backdoor defense
Information Security
YANG, Yanxin
JIA, Chentao
YAN, Dengke
HU, Ming
LI, Tianlin
XIE, Xiaofei
WEI, Xian
CHEN, Mingsong
SampDetox : Black-box backdoor defense via perturbation-based sample detoxification
description The advancement of Machine Learning has enabled the widespread deployment of Machine Learning as a Service (MLaaS) applications. However, the untrustworthy nature of third-party ML services poses backdoor threats. Existing defenses in MLaaS are limited by their reliance on training samples or white-box model analysis, highlighting the need for a black-box backdoor purification method. In our paper, we attempt to use diffusion models for purification by introducing noise in a forward diffusion process to destroy backdoors and recover clean samples through a reverse generative process. However, since a higher noise also destroys the semantics of the original samples, it still results in a low restoration performance. To investigate the effectiveness of noise in eliminating different types of backdoors, we conducted a preliminary study, which demonstrates that backdoors with low visibility can be easily destroyed by lightweight noise and those with high visibility need to be destroyed by high noise but can be easily detected. Based on the study, we propose SampDetox, which strategically combines lightweight and high noise. SampDetox applies weak noise to eliminate low-visibility backdoors and compares the structural similarity between the recovered and original samples to localize high-visibility backdoors. Intensive noise is then applied to these localized areas, destroying the high-visibility backdoors while preserving global semantic information. As a result, detoxified samples can be used for inference, even by poisoned models. Comprehensive experiments demonstrate the effectiveness of SampDetox in defending against various state-of-the-art backdoor attacks.
format text
author YANG, Yanxin
JIA, Chentao
YAN, Dengke
HU, Ming
LI, Tianlin
XIE, Xiaofei
WEI, Xian
CHEN, Mingsong
author_facet YANG, Yanxin
JIA, Chentao
YAN, Dengke
HU, Ming
LI, Tianlin
XIE, Xiaofei
WEI, Xian
CHEN, Mingsong
author_sort YANG, Yanxin
title SampDetox : Black-box backdoor defense via perturbation-based sample detoxification
title_short SampDetox : Black-box backdoor defense via perturbation-based sample detoxification
title_full SampDetox : Black-box backdoor defense via perturbation-based sample detoxification
title_fullStr SampDetox : Black-box backdoor defense via perturbation-based sample detoxification
title_full_unstemmed SampDetox : Black-box backdoor defense via perturbation-based sample detoxification
title_sort sampdetox : black-box backdoor defense via perturbation-based sample detoxification
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9812
https://ink.library.smu.edu.sg/context/sis_research/article/10812/viewcontent/8771_SampDetox_Black_box_Backd.pdf
_version_ 1820027788866879488