Detection of attacks on artificial intelligence systems

Artificial intelligence (AI) is gradually and profoundly changing production and life, generally used in various fields such as visual information processing, autonomous systems, safety diagnosis and protection. Security issues will eventually become the biggest challenge. The adversarial attack is...

Full description

Saved in:
Bibliographic Details
Main Author: Pan, Siyu
Other Authors: Wen Bihan
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/152977
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Artificial intelligence (AI) is gradually and profoundly changing production and life, generally used in various fields such as visual information processing, autonomous systems, safety diagnosis and protection. Security issues will eventually become the biggest challenge. The adversarial attack is a powerful security threat to Deep Neural Networks (DNNs). This dissertation focuses on a passive defence method -- the detection of adversarial samples. The adversarial sample is essentially different from the normal sample. The dimension of the high-dimensional continuous space in which it is located is much larger than the intrinsic dimensionality of any given data submanifold. Focusing on the Local Intrinsic Dimensionality (LID), a better detector -- LID-based classifier is studied. Four attack methods were used to conduct experiments on two common datasets. The experiments show that the LID-based classifier has a great performance improvement than the single-characteristic classifier based on Kernel Density (KD) and Bayesian Network Uncertainty (BNU) and the combined classifier based on KD&BNU. The improvement is up to 37.44% for the single-characteristic classifiers, and up to 18.65% for the combined classifiers. Then it proves that the LID-based classifier trained based on one attack can detect adversarial samples generated by other attack methods. The classifiers trained on weaker attacks will perform better in the face of adversarial samples generated by other stronger attacks than when testing under the same attack, and vice versa. It fully proves that the LID-based classifier is a very effective means to detect adversarial samples, with certain universality and transferability. Finally, the dissertation also makes a prediction and recommendation for the possible direction of the future work.