Diffense: defense against backdoor attacks on deep neural networks with latent diffusion

As deep neural network (DNN) models are used in a wide variety of applications, their security has attracted considerable attention. Among the known security vulnerabilities, backdoor attacks have become the most notorious threat to users of pre-trained DNNs and machine learning services. Such attac...

Full description

Saved in:
Bibliographic Details
Main Authors: Hu, Bowen, Chang, Chip Hong
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2025
Subjects:
Online Access:https://hdl.handle.net/10356/181984
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-181984
record_format dspace
spelling sg-ntu-dr.10356-1819842025-01-10T15:43:52Z Diffense: defense against backdoor attacks on deep neural networks with latent diffusion Hu, Bowen Chang, Chip Hong School of Electrical and Electronic Engineering Engineering Deep neural networks AI security As deep neural network (DNN) models are used in a wide variety of applications, their security has attracted considerable attention. Among the known security vulnerabilities, backdoor attacks have become the most notorious threat to users of pre-trained DNNs and machine learning services. Such attacks manipulate the training data or training process in such a way that the trained model produces a false output to an input that carries a specific trigger, but behaves normally otherwise. In this work, we propose Diffense, a method for detecting such malicious inputs based on the distribution of the latent feature maps to clean input samples of the possibly infected target DNN. By learning the feature map distribution using the diffusion model and sampling from the model under the guidance of the data to be inspected, backdoor attack data can be detected by its distance from the sampled result. Diffense does not require knowledge about the structure, weights, and training data of the target DNN model, nor does it need to be aware of the backdoor attack method. Diffense is non-intrusive. The accuracy of the target model to clean inputs will not be affected by Diffense and the inference service can be run uninterruptedly with Diffense. Extensive experiments were conducted on DNNs trained for MNIST, CIFRA-10, GSTRB, ImageNet-10, LSUN Object and LSUN Scene applications to show that the attack success rates of diverse backdoor attacks, including BadNets, IDBA, WaNet, ISSBA and HTBA, can be significantly suppressed by Diffense. The results generally exceed the performances of existing backdoor mitigation methods, including those that require model modifications or prerequisite knowledge of model weights or attack samples. Ministry of Education (MOE) Submitted/Accepted version This work was supported by the Ministry of Education, Singapore, under its Academic Research Fund (AcRF) Tier 2 under Award MOE-T2EP50220-0003. 2025-01-05T03:40:11Z 2025-01-05T03:40:11Z 2024 Journal Article Hu, B. & Chang, C. H. (2024). Diffense: defense against backdoor attacks on deep neural networks with latent diffusion. IEEE Journal On Emerging and Selected Topics in Circuits and Systems, 14(4), 729-742. https://dx.doi.org/10.1109/JETCAS.2024.3469377 2156-3357 https://hdl.handle.net/10356/181984 10.1109/JETCAS.2024.3469377 2-s2.0-85206466120 4 14 729 742 en MOE-T2EP50220-0003 IEEE Journal on Emerging and Selected Topics in Circuits and Systems © 2024 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/JETCAS.2024.3469377. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering
Deep neural networks
AI security
spellingShingle Engineering
Deep neural networks
AI security
Hu, Bowen
Chang, Chip Hong
Diffense: defense against backdoor attacks on deep neural networks with latent diffusion
description As deep neural network (DNN) models are used in a wide variety of applications, their security has attracted considerable attention. Among the known security vulnerabilities, backdoor attacks have become the most notorious threat to users of pre-trained DNNs and machine learning services. Such attacks manipulate the training data or training process in such a way that the trained model produces a false output to an input that carries a specific trigger, but behaves normally otherwise. In this work, we propose Diffense, a method for detecting such malicious inputs based on the distribution of the latent feature maps to clean input samples of the possibly infected target DNN. By learning the feature map distribution using the diffusion model and sampling from the model under the guidance of the data to be inspected, backdoor attack data can be detected by its distance from the sampled result. Diffense does not require knowledge about the structure, weights, and training data of the target DNN model, nor does it need to be aware of the backdoor attack method. Diffense is non-intrusive. The accuracy of the target model to clean inputs will not be affected by Diffense and the inference service can be run uninterruptedly with Diffense. Extensive experiments were conducted on DNNs trained for MNIST, CIFRA-10, GSTRB, ImageNet-10, LSUN Object and LSUN Scene applications to show that the attack success rates of diverse backdoor attacks, including BadNets, IDBA, WaNet, ISSBA and HTBA, can be significantly suppressed by Diffense. The results generally exceed the performances of existing backdoor mitigation methods, including those that require model modifications or prerequisite knowledge of model weights or attack samples.
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Hu, Bowen
Chang, Chip Hong
format Article
author Hu, Bowen
Chang, Chip Hong
author_sort Hu, Bowen
title Diffense: defense against backdoor attacks on deep neural networks with latent diffusion
title_short Diffense: defense against backdoor attacks on deep neural networks with latent diffusion
title_full Diffense: defense against backdoor attacks on deep neural networks with latent diffusion
title_fullStr Diffense: defense against backdoor attacks on deep neural networks with latent diffusion
title_full_unstemmed Diffense: defense against backdoor attacks on deep neural networks with latent diffusion
title_sort diffense: defense against backdoor attacks on deep neural networks with latent diffusion
publishDate 2025
url https://hdl.handle.net/10356/181984
_version_ 1821237145005719552