Diffense: defense against backdoor attacks on deep neural networks with latent diffusion
As deep neural network (DNN) models are used in a wide variety of applications, their security has attracted considerable attention. Among the known security vulnerabilities, backdoor attacks have become the most notorious threat to users of pre-trained DNNs and machine learning services. Such attac...
Saved in:
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181984 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-181984 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1819842025-01-10T15:43:52Z Diffense: defense against backdoor attacks on deep neural networks with latent diffusion Hu, Bowen Chang, Chip Hong School of Electrical and Electronic Engineering Engineering Deep neural networks AI security As deep neural network (DNN) models are used in a wide variety of applications, their security has attracted considerable attention. Among the known security vulnerabilities, backdoor attacks have become the most notorious threat to users of pre-trained DNNs and machine learning services. Such attacks manipulate the training data or training process in such a way that the trained model produces a false output to an input that carries a specific trigger, but behaves normally otherwise. In this work, we propose Diffense, a method for detecting such malicious inputs based on the distribution of the latent feature maps to clean input samples of the possibly infected target DNN. By learning the feature map distribution using the diffusion model and sampling from the model under the guidance of the data to be inspected, backdoor attack data can be detected by its distance from the sampled result. Diffense does not require knowledge about the structure, weights, and training data of the target DNN model, nor does it need to be aware of the backdoor attack method. Diffense is non-intrusive. The accuracy of the target model to clean inputs will not be affected by Diffense and the inference service can be run uninterruptedly with Diffense. Extensive experiments were conducted on DNNs trained for MNIST, CIFRA-10, GSTRB, ImageNet-10, LSUN Object and LSUN Scene applications to show that the attack success rates of diverse backdoor attacks, including BadNets, IDBA, WaNet, ISSBA and HTBA, can be significantly suppressed by Diffense. The results generally exceed the performances of existing backdoor mitigation methods, including those that require model modifications or prerequisite knowledge of model weights or attack samples. Ministry of Education (MOE) Submitted/Accepted version This work was supported by the Ministry of Education, Singapore, under its Academic Research Fund (AcRF) Tier 2 under Award MOE-T2EP50220-0003. 2025-01-05T03:40:11Z 2025-01-05T03:40:11Z 2024 Journal Article Hu, B. & Chang, C. H. (2024). Diffense: defense against backdoor attacks on deep neural networks with latent diffusion. IEEE Journal On Emerging and Selected Topics in Circuits and Systems, 14(4), 729-742. https://dx.doi.org/10.1109/JETCAS.2024.3469377 2156-3357 https://hdl.handle.net/10356/181984 10.1109/JETCAS.2024.3469377 2-s2.0-85206466120 4 14 729 742 en MOE-T2EP50220-0003 IEEE Journal on Emerging and Selected Topics in Circuits and Systems © 2024 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/JETCAS.2024.3469377. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering Deep neural networks AI security |
spellingShingle |
Engineering Deep neural networks AI security Hu, Bowen Chang, Chip Hong Diffense: defense against backdoor attacks on deep neural networks with latent diffusion |
description |
As deep neural network (DNN) models are used in a wide variety of applications, their security has attracted considerable attention. Among the known security vulnerabilities, backdoor attacks have become the most notorious threat to users of pre-trained DNNs and machine learning services. Such attacks manipulate the training data or training process in such a way that the trained model produces a false output to an input that carries a specific trigger, but behaves normally otherwise. In this work, we propose Diffense, a method for detecting such malicious inputs based on the distribution of the latent feature maps to clean input samples of the possibly infected target DNN. By learning the feature map distribution using the diffusion model and sampling from the model under the guidance of the data to be inspected, backdoor attack data can be detected by its distance from the sampled result. Diffense does not require knowledge about the structure, weights, and training data of the target DNN model, nor does it need to be aware of the backdoor attack method. Diffense is non-intrusive. The accuracy of the target model to clean inputs will not be affected by Diffense and the inference service can be run uninterruptedly with Diffense. Extensive experiments were conducted on DNNs trained for MNIST, CIFRA-10, GSTRB, ImageNet-10, LSUN Object and LSUN Scene applications to show that the attack success rates of diverse backdoor attacks, including BadNets, IDBA, WaNet, ISSBA and HTBA, can be significantly suppressed by Diffense. The results generally exceed the performances of existing backdoor mitigation methods, including those that require model modifications or prerequisite knowledge of model weights or attack samples. |
author2 |
School of Electrical and Electronic Engineering |
author_facet |
School of Electrical and Electronic Engineering Hu, Bowen Chang, Chip Hong |
format |
Article |
author |
Hu, Bowen Chang, Chip Hong |
author_sort |
Hu, Bowen |
title |
Diffense: defense against backdoor attacks on deep neural networks with latent diffusion |
title_short |
Diffense: defense against backdoor attacks on deep neural networks with latent diffusion |
title_full |
Diffense: defense against backdoor attacks on deep neural networks with latent diffusion |
title_fullStr |
Diffense: defense against backdoor attacks on deep neural networks with latent diffusion |
title_full_unstemmed |
Diffense: defense against backdoor attacks on deep neural networks with latent diffusion |
title_sort |
diffense: defense against backdoor attacks on deep neural networks with latent diffusion |
publishDate |
2025 |
url |
https://hdl.handle.net/10356/181984 |
_version_ |
1821237145005719552 |