Detection of adversarial attacks via disentangling natural images and perturbations

The vulnerability of deep neural networks against adversarial attacks, i.e., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling N...

Full description

Saved in:
Bibliographic Details
Main Authors: Qing, Yuanyuan, Bai, Tao, Liu, Zhuotao, Moulin, Pierre, Wen, Bihan
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/178082
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-178082
record_format dspace
spelling sg-ntu-dr.10356-1780822024-06-04T05:47:04Z Detection of adversarial attacks via disentangling natural images and perturbations Qing, Yuanyuan Bai, Tao Liu, Zhuotao Moulin, Pierre Wen, Bihan School of Electrical and Electronic Engineering Engineering Adversarial detection Representation learning The vulnerability of deep neural networks against adversarial attacks, i.e., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling Natural images and Perturbations (ADDNP) is proposed. Compared to natural images that can typically be modeled by lower-dimensional subspaces or manifolds, the distributions of adversarial perturbations are much more complex, e.g., one normal example's adversarial counterparts generated by different attack strategies can be significantly distinct. The proposed ADDNP exploits such distinct properties for the detection of adversarial attacks amongst normal examples. Specifically, we use a dual-branch disentangling framework to encode natural images and perturbations of inputs separately, followed by joint reconstruction. During inference, the reconstruction discrepancy (RD) measured in the learned latent feature space is used as an indicator of adversarial perturbations. The proposed ADDNP algorithm is evaluated on three popular datasets, i.e., CIFAR-10, CIFAR-100, and mini ImageNet with increasing data complexity, across multiple popular attack strategies. Compared to the existing and state-of-the-art detection methods, ADDNP has demonstrated promising performance on adversarial detection, with significant improvements on more challenging datasets. Ministry of Education (MOE) This work was supported in part by the Singapore Ministry of Education AcRF Tier 1 under Grant RG61/22 and Start-Up Grant. 2024-06-04T05:47:03Z 2024-06-04T05:47:03Z 2024 Journal Article Qing, Y., Bai, T., Liu, Z., Moulin, P. & Wen, B. (2024). Detection of adversarial attacks via disentangling natural images and perturbations. IEEE Transactions On Information Forensics and Security, 19, 2814-2825. https://dx.doi.org/10.1109/TIFS.2024.3352837 1556-6013 https://hdl.handle.net/10356/178082 10.1109/TIFS.2024.3352837 2-s2.0-85182921741 19 2814 2825 en RG61/22 IEEE Transactions on Information Forensics and Security © 2024 IEEE. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering
Adversarial detection
Representation learning
spellingShingle Engineering
Adversarial detection
Representation learning
Qing, Yuanyuan
Bai, Tao
Liu, Zhuotao
Moulin, Pierre
Wen, Bihan
Detection of adversarial attacks via disentangling natural images and perturbations
description The vulnerability of deep neural networks against adversarial attacks, i.e., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling Natural images and Perturbations (ADDNP) is proposed. Compared to natural images that can typically be modeled by lower-dimensional subspaces or manifolds, the distributions of adversarial perturbations are much more complex, e.g., one normal example's adversarial counterparts generated by different attack strategies can be significantly distinct. The proposed ADDNP exploits such distinct properties for the detection of adversarial attacks amongst normal examples. Specifically, we use a dual-branch disentangling framework to encode natural images and perturbations of inputs separately, followed by joint reconstruction. During inference, the reconstruction discrepancy (RD) measured in the learned latent feature space is used as an indicator of adversarial perturbations. The proposed ADDNP algorithm is evaluated on three popular datasets, i.e., CIFAR-10, CIFAR-100, and mini ImageNet with increasing data complexity, across multiple popular attack strategies. Compared to the existing and state-of-the-art detection methods, ADDNP has demonstrated promising performance on adversarial detection, with significant improvements on more challenging datasets.
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Qing, Yuanyuan
Bai, Tao
Liu, Zhuotao
Moulin, Pierre
Wen, Bihan
format Article
author Qing, Yuanyuan
Bai, Tao
Liu, Zhuotao
Moulin, Pierre
Wen, Bihan
author_sort Qing, Yuanyuan
title Detection of adversarial attacks via disentangling natural images and perturbations
title_short Detection of adversarial attacks via disentangling natural images and perturbations
title_full Detection of adversarial attacks via disentangling natural images and perturbations
title_fullStr Detection of adversarial attacks via disentangling natural images and perturbations
title_full_unstemmed Detection of adversarial attacks via disentangling natural images and perturbations
title_sort detection of adversarial attacks via disentangling natural images and perturbations
publishDate 2024
url https://hdl.handle.net/10356/178082
_version_ 1806059869156933632