Diffense: defense against backdoor attacks on deep neural networks with latent diffusion

As deep neural network (DNN) models are used in a wide variety of applications, their security has attracted considerable attention. Among the known security vulnerabilities, backdoor attacks have become the most notorious threat to users of pre-trained DNNs and machine learning services. Such attac...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hu, Bowen, Chang, Chip Hong
Other Authors:	School of Electrical and Electronic Engineering
Format:	Article
Language:	English
Published:	2025
Subjects:	Engineering Deep neural networks AI security
Online Access:	https://hdl.handle.net/10356/181984
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-181984
record_format	dspace
spelling	sg-ntu-dr.10356-1819842025-01-10T15:43:52Z Diffense: defense against backdoor attacks on deep neural networks with latent diffusion Hu, Bowen Chang, Chip Hong School of Electrical and Electronic Engineering Engineering Deep neural networks AI security As deep neural network (DNN) models are used in a wide variety of applications, their security has attracted considerable attention. Among the known security vulnerabilities, backdoor attacks have become the most notorious threat to users of pre-trained DNNs and machine learning services. Such attacks manipulate the training data or training process in such a way that the trained model produces a false output to an input that carries a specific trigger, but behaves normally otherwise. In this work, we propose Diffense, a method for detecting such malicious inputs based on the distribution of the latent feature maps to clean input samples of the possibly infected target DNN. By learning the feature map distribution using the diffusion model and sampling from the model under the guidance of the data to be inspected, backdoor attack data can be detected by its distance from the sampled result. Diffense does not require knowledge about the structure, weights, and training data of the target DNN model, nor does it need to be aware of the backdoor attack method. Diffense is non-intrusive. The accuracy of the target model to clean inputs will not be affected by Diffense and the inference service can be run uninterruptedly with Diffense. Extensive experiments were conducted on DNNs trained for MNIST, CIFRA-10, GSTRB, ImageNet-10, LSUN Object and LSUN Scene applications to show that the attack success rates of diverse backdoor attacks, including BadNets, IDBA, WaNet, ISSBA and HTBA, can be significantly suppressed by Diffense. The results generally exceed the performances of existing backdoor mitigation methods, including those that require model modifications or prerequisite knowledge of model weights or attack samples. Ministry of Education (MOE) Submitted/Accepted version This work was supported by the Ministry of Education, Singapore, under its Academic Research Fund (AcRF) Tier 2 under Award MOE-T2EP50220-0003. 2025-01-05T03:40:11Z 2025-01-05T03:40:11Z 2024 Journal Article Hu, B. & Chang, C. H. (2024). Diffense: defense against backdoor attacks on deep neural networks with latent diffusion. IEEE Journal On Emerging and Selected Topics in Circuits and Systems, 14(4), 729-742. https://dx.doi.org/10.1109/JETCAS.2024.3469377 2156-3357 https://hdl.handle.net/10356/181984 10.1109/JETCAS.2024.3469377 2-s2.0-85206466120 4 14 729 742 en MOE-T2EP50220-0003 IEEE Journal on Emerging and Selected Topics in Circuits and Systems © 2024 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/JETCAS.2024.3469377. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering Deep neural networks AI security
spellingShingle	Engineering Deep neural networks AI security Hu, Bowen Chang, Chip Hong Diffense: defense against backdoor attacks on deep neural networks with latent diffusion
description	As deep neural network (DNN) models are used in a wide variety of applications, their security has attracted considerable attention. Among the known security vulnerabilities, backdoor attacks have become the most notorious threat to users of pre-trained DNNs and machine learning services. Such attacks manipulate the training data or training process in such a way that the trained model produces a false output to an input that carries a specific trigger, but behaves normally otherwise. In this work, we propose Diffense, a method for detecting such malicious inputs based on the distribution of the latent feature maps to clean input samples of the possibly infected target DNN. By learning the feature map distribution using the diffusion model and sampling from the model under the guidance of the data to be inspected, backdoor attack data can be detected by its distance from the sampled result. Diffense does not require knowledge about the structure, weights, and training data of the target DNN model, nor does it need to be aware of the backdoor attack method. Diffense is non-intrusive. The accuracy of the target model to clean inputs will not be affected by Diffense and the inference service can be run uninterruptedly with Diffense. Extensive experiments were conducted on DNNs trained for MNIST, CIFRA-10, GSTRB, ImageNet-10, LSUN Object and LSUN Scene applications to show that the attack success rates of diverse backdoor attacks, including BadNets, IDBA, WaNet, ISSBA and HTBA, can be significantly suppressed by Diffense. The results generally exceed the performances of existing backdoor mitigation methods, including those that require model modifications or prerequisite knowledge of model weights or attack samples.
author2	School of Electrical and Electronic Engineering
author_facet	School of Electrical and Electronic Engineering Hu, Bowen Chang, Chip Hong
format	Article
author	Hu, Bowen Chang, Chip Hong
author_sort	Hu, Bowen
title	Diffense: defense against backdoor attacks on deep neural networks with latent diffusion
title_short	Diffense: defense against backdoor attacks on deep neural networks with latent diffusion
title_full	Diffense: defense against backdoor attacks on deep neural networks with latent diffusion
title_fullStr	Diffense: defense against backdoor attacks on deep neural networks with latent diffusion
title_full_unstemmed	Diffense: defense against backdoor attacks on deep neural networks with latent diffusion
title_sort	diffense: defense against backdoor attacks on deep neural networks with latent diffusion
publishDate	2025
url	https://hdl.handle.net/10356/181984
_version_	1821237145005719552

Diffense: defense against backdoor attacks on deep neural networks with latent diffusion

Similar Items