Error-correcting output codes with ensemble diversity for robust learning in neural networks
Though deep learning has been applied successfully in many scenarios, malicious inputs with human-imperceptible perturbations can make it vulnerable in real applications. This paper proposes an error-correcting neural network (ECNN) that combines a set of binary classifiers to combat adversarial exam...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/147336 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Though deep learning has been applied successfully in many scenarios, malicious inputs with human-imperceptible perturbations can make it vulnerable in real applications. This paper proposes an error-correcting neural network (ECNN) that combines a set of binary classifiers to combat adversarial examples in the multi-class classification problem. To build an ECNN, we propose to design a code matrix so that the minimum Hamming distance between any two rows (i.e., two codewords) and the minimum shared information distance between any two columns (i.e., two partitions of class labels) are simultaneously maximized. Maximizing row distances can increase the system fault tolerance while maximizing column distances helps increase the diversity between binary classifiers. We propose an end-to-end training method for our ECNN, which allows further improvement of the diversity between binary classifiers. The end-to-end training renders our proposed ECNN different from the traditional error-correcting output code (ECOC) based methods that train binary classifiers independently. ECNN is complementary to other existing defense approaches such as adversarial training and can be applied in conjunction with them. We empirically demonstrate that our proposed ECNN is effective against the state-of-the-art white-box and black-box attacks on several datasets while maintaining good classification accuracy on normal examples. |
---|