Detection of adversarial attacks via disentangling natural images and perturbations

The vulnerability of deep neural networks against adversarial attacks, i.e., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling N...

Full description

Saved in:

Bibliographic Details
Main Authors:	Qing, Yuanyuan, Bai, Tao, Liu, Zhuotao, Moulin, Pierre, Wen, Bihan
Other Authors:	School of Electrical and Electronic Engineering
Format:	Article
Language:	English
Published:	2024
Subjects:	Engineering Adversarial detection Representation learning
Online Access:	https://hdl.handle.net/10356/178082
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-178082
record_format	dspace
spelling	sg-ntu-dr.10356-1780822024-06-04T05:47:04Z Detection of adversarial attacks via disentangling natural images and perturbations Qing, Yuanyuan Bai, Tao Liu, Zhuotao Moulin, Pierre Wen, Bihan School of Electrical and Electronic Engineering Engineering Adversarial detection Representation learning The vulnerability of deep neural networks against adversarial attacks, i.e., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling Natural images and Perturbations (ADDNP) is proposed. Compared to natural images that can typically be modeled by lower-dimensional subspaces or manifolds, the distributions of adversarial perturbations are much more complex, e.g., one normal example's adversarial counterparts generated by different attack strategies can be significantly distinct. The proposed ADDNP exploits such distinct properties for the detection of adversarial attacks amongst normal examples. Specifically, we use a dual-branch disentangling framework to encode natural images and perturbations of inputs separately, followed by joint reconstruction. During inference, the reconstruction discrepancy (RD) measured in the learned latent feature space is used as an indicator of adversarial perturbations. The proposed ADDNP algorithm is evaluated on three popular datasets, i.e., CIFAR-10, CIFAR-100, and mini ImageNet with increasing data complexity, across multiple popular attack strategies. Compared to the existing and state-of-the-art detection methods, ADDNP has demonstrated promising performance on adversarial detection, with significant improvements on more challenging datasets. Ministry of Education (MOE) This work was supported in part by the Singapore Ministry of Education AcRF Tier 1 under Grant RG61/22 and Start-Up Grant. 2024-06-04T05:47:03Z 2024-06-04T05:47:03Z 2024 Journal Article Qing, Y., Bai, T., Liu, Z., Moulin, P. & Wen, B. (2024). Detection of adversarial attacks via disentangling natural images and perturbations. IEEE Transactions On Information Forensics and Security, 19, 2814-2825. https://dx.doi.org/10.1109/TIFS.2024.3352837 1556-6013 https://hdl.handle.net/10356/178082 10.1109/TIFS.2024.3352837 2-s2.0-85182921741 19 2814 2825 en RG61/22 IEEE Transactions on Information Forensics and Security © 2024 IEEE. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering Adversarial detection Representation learning
spellingShingle	Engineering Adversarial detection Representation learning Qing, Yuanyuan Bai, Tao Liu, Zhuotao Moulin, Pierre Wen, Bihan Detection of adversarial attacks via disentangling natural images and perturbations
description	The vulnerability of deep neural networks against adversarial attacks, i.e., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling Natural images and Perturbations (ADDNP) is proposed. Compared to natural images that can typically be modeled by lower-dimensional subspaces or manifolds, the distributions of adversarial perturbations are much more complex, e.g., one normal example's adversarial counterparts generated by different attack strategies can be significantly distinct. The proposed ADDNP exploits such distinct properties for the detection of adversarial attacks amongst normal examples. Specifically, we use a dual-branch disentangling framework to encode natural images and perturbations of inputs separately, followed by joint reconstruction. During inference, the reconstruction discrepancy (RD) measured in the learned latent feature space is used as an indicator of adversarial perturbations. The proposed ADDNP algorithm is evaluated on three popular datasets, i.e., CIFAR-10, CIFAR-100, and mini ImageNet with increasing data complexity, across multiple popular attack strategies. Compared to the existing and state-of-the-art detection methods, ADDNP has demonstrated promising performance on adversarial detection, with significant improvements on more challenging datasets.
author2	School of Electrical and Electronic Engineering
author_facet	School of Electrical and Electronic Engineering Qing, Yuanyuan Bai, Tao Liu, Zhuotao Moulin, Pierre Wen, Bihan
format	Article
author	Qing, Yuanyuan Bai, Tao Liu, Zhuotao Moulin, Pierre Wen, Bihan
author_sort	Qing, Yuanyuan
title	Detection of adversarial attacks via disentangling natural images and perturbations
title_short	Detection of adversarial attacks via disentangling natural images and perturbations
title_full	Detection of adversarial attacks via disentangling natural images and perturbations
title_fullStr	Detection of adversarial attacks via disentangling natural images and perturbations
title_full_unstemmed	Detection of adversarial attacks via disentangling natural images and perturbations
title_sort	detection of adversarial attacks via disentangling natural images and perturbations
publishDate	2024
url	https://hdl.handle.net/10356/178082
_version_	1806059869156933632

Detection of adversarial attacks via disentangling natural images and perturbations

Similar Items