Detection of adversarial attacks via disentangling natural images and perturbations
The vulnerability of deep neural networks against adversarial attacks, i.e., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling N...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/178082 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-178082 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1780822024-06-04T05:47:04Z Detection of adversarial attacks via disentangling natural images and perturbations Qing, Yuanyuan Bai, Tao Liu, Zhuotao Moulin, Pierre Wen, Bihan School of Electrical and Electronic Engineering Engineering Adversarial detection Representation learning The vulnerability of deep neural networks against adversarial attacks, i.e., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling Natural images and Perturbations (ADDNP) is proposed. Compared to natural images that can typically be modeled by lower-dimensional subspaces or manifolds, the distributions of adversarial perturbations are much more complex, e.g., one normal example's adversarial counterparts generated by different attack strategies can be significantly distinct. The proposed ADDNP exploits such distinct properties for the detection of adversarial attacks amongst normal examples. Specifically, we use a dual-branch disentangling framework to encode natural images and perturbations of inputs separately, followed by joint reconstruction. During inference, the reconstruction discrepancy (RD) measured in the learned latent feature space is used as an indicator of adversarial perturbations. The proposed ADDNP algorithm is evaluated on three popular datasets, i.e., CIFAR-10, CIFAR-100, and mini ImageNet with increasing data complexity, across multiple popular attack strategies. Compared to the existing and state-of-the-art detection methods, ADDNP has demonstrated promising performance on adversarial detection, with significant improvements on more challenging datasets. Ministry of Education (MOE) This work was supported in part by the Singapore Ministry of Education AcRF Tier 1 under Grant RG61/22 and Start-Up Grant. 2024-06-04T05:47:03Z 2024-06-04T05:47:03Z 2024 Journal Article Qing, Y., Bai, T., Liu, Z., Moulin, P. & Wen, B. (2024). Detection of adversarial attacks via disentangling natural images and perturbations. IEEE Transactions On Information Forensics and Security, 19, 2814-2825. https://dx.doi.org/10.1109/TIFS.2024.3352837 1556-6013 https://hdl.handle.net/10356/178082 10.1109/TIFS.2024.3352837 2-s2.0-85182921741 19 2814 2825 en RG61/22 IEEE Transactions on Information Forensics and Security © 2024 IEEE. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering Adversarial detection Representation learning |
spellingShingle |
Engineering Adversarial detection Representation learning Qing, Yuanyuan Bai, Tao Liu, Zhuotao Moulin, Pierre Wen, Bihan Detection of adversarial attacks via disentangling natural images and perturbations |
description |
The vulnerability of deep neural networks against adversarial attacks, i.e., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling Natural images and Perturbations (ADDNP) is proposed. Compared to natural images that can typically be modeled by lower-dimensional subspaces or manifolds, the distributions of adversarial perturbations are much more complex, e.g., one normal example's adversarial counterparts generated by different attack strategies can be significantly distinct. The proposed ADDNP exploits such distinct properties for the detection of adversarial attacks amongst normal examples. Specifically, we use a dual-branch disentangling framework to encode natural images and perturbations of inputs separately, followed by joint reconstruction. During inference, the reconstruction discrepancy (RD) measured in the learned latent feature space is used as an indicator of adversarial perturbations. The proposed ADDNP algorithm is evaluated on three popular datasets, i.e., CIFAR-10, CIFAR-100, and mini ImageNet with increasing data complexity, across multiple popular attack strategies. Compared to the existing and state-of-the-art detection methods, ADDNP has demonstrated promising performance on adversarial detection, with significant improvements on more challenging datasets. |
author2 |
School of Electrical and Electronic Engineering |
author_facet |
School of Electrical and Electronic Engineering Qing, Yuanyuan Bai, Tao Liu, Zhuotao Moulin, Pierre Wen, Bihan |
format |
Article |
author |
Qing, Yuanyuan Bai, Tao Liu, Zhuotao Moulin, Pierre Wen, Bihan |
author_sort |
Qing, Yuanyuan |
title |
Detection of adversarial attacks via disentangling natural images and perturbations |
title_short |
Detection of adversarial attacks via disentangling natural images and perturbations |
title_full |
Detection of adversarial attacks via disentangling natural images and perturbations |
title_fullStr |
Detection of adversarial attacks via disentangling natural images and perturbations |
title_full_unstemmed |
Detection of adversarial attacks via disentangling natural images and perturbations |
title_sort |
detection of adversarial attacks via disentangling natural images and perturbations |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/178082 |
_version_ |
1806059869156933632 |