Detection of adversarial attacks via disentangling natural images and perturbations

The vulnerability of deep neural networks against adversarial attacks, i.e., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling N...

Full description

Saved in:
Bibliographic Details
Main Authors: Qing, Yuanyuan, Bai, Tao, Liu, Zhuotao, Moulin, Pierre, Wen, Bihan
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/178082
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The vulnerability of deep neural networks against adversarial attacks, i.e., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling Natural images and Perturbations (ADDNP) is proposed. Compared to natural images that can typically be modeled by lower-dimensional subspaces or manifolds, the distributions of adversarial perturbations are much more complex, e.g., one normal example's adversarial counterparts generated by different attack strategies can be significantly distinct. The proposed ADDNP exploits such distinct properties for the detection of adversarial attacks amongst normal examples. Specifically, we use a dual-branch disentangling framework to encode natural images and perturbations of inputs separately, followed by joint reconstruction. During inference, the reconstruction discrepancy (RD) measured in the learned latent feature space is used as an indicator of adversarial perturbations. The proposed ADDNP algorithm is evaluated on three popular datasets, i.e., CIFAR-10, CIFAR-100, and mini ImageNet with increasing data complexity, across multiple popular attack strategies. Compared to the existing and state-of-the-art detection methods, ADDNP has demonstrated promising performance on adversarial detection, with significant improvements on more challenging datasets.