Fired neuron rate based decision tree for detection of adversarial examples in DNNs
Deep neural network (DNN) is a prevalent machine learning solution to computer vision problems. The most criticized vulnerability of deep learning is its susceptibility towards adversarial images crafted by maliciously adding infinitesimal distortions to the benign inputs. Such negatives can fool a...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/144346 https://doi.org/10.21979/N9/YPY0EB |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Deep neural network (DNN) is a prevalent machine learning solution to computer vision problems. The most criticized vulnerability of deep learning is its susceptibility towards adversarial images crafted by maliciously adding infinitesimal distortions
to the benign inputs. Such negatives can fool a classifier. Existing countermeasures against these adversarial attacks are mainly developed based on software model of DNNs by using modified training during learning or modified input during testing, modifying networks or changing loss/activation functions, or relying on add-on models for classifying unseen examples. These approaches do not consider the optimization for hardware implementation of the learning models. In this paper, a new thresholding method is proposed based on comparators integrated into the most discriminative layers of the DNN determined by their layer-wise fired neuron rates between adversarial and normal inputs. Effectiveness of the method is validated on the ImageNet dataset with 8-bit truncated models for the state-of-the-art DNN architectures. A high detection rate of up to 98% with only 4.5% of false positive
rate is achieved. The results show a significant improvement on both detection rate and false positive rate compared with previous countermeasures against the most practical non-invasive universal perturbation attack on deep learning based AI chip. |
---|