Protecting neural networks from adversarial attacks

As modern technology is rapidly progressing, more applications are utilizing aspects of machine learning—especially deep learning to time-critical and real-world applications. Adversaries are coming up with new ways to exploit attack surfaces in the machine learning process, rendering systems and ap...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Bryan Bing Xing
Other Authors: Anupam Chattopadhyay
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137938
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-137938
record_format dspace
spelling sg-ntu-dr.10356-1379382020-04-20T03:46:14Z Protecting neural networks from adversarial attacks Tan, Bryan Bing Xing Anupam Chattopadhyay School of Computer Science and Engineering anupam@ntu.edu.sg Engineering::Computer science and engineering As modern technology is rapidly progressing, more applications are utilizing aspects of machine learning—especially deep learning to time-critical and real-world applications. Adversaries are coming up with new ways to exploit attack surfaces in the machine learning process, rendering systems and applications to be ineffective. By using carefully crafted adversarial examples, an adversary can cause even the most well-trained model to fail. Current defences implementations do not seem to cover all the attack surfaces, resulting in the need to understand the concepts behind adversarial attacks even further, to develop a strategic defence mechanism. In this project, a theoretical framework is proposed to research more on the concepts of adversarial examples, to develop a defence strategy against such attacks, for image recognition applications. Before coming up with this defence strategy, a study is conducted to understand the concepts of adversarial attacks. The study involves looking at how adversarial perturbations are generated given an image, and on what specific areas do these perturbations are formed. Also, a study is conducted to see what does a classifier look at when trying to classify an image. From the information gathered from studies, several methods are imposed to find these specific areas and present them on attention maps. Next, an attempt to find a spatial correlation between these areas is made. The idea is to see if any of these areas overlap with each other, and if so, can the area that is susceptible to adversarial perturbations be minimized or removed. There are two key experiments for this project. The first experiment involves analyzing the performance of classification for both pre and post attacked, using Fast Gradient Sign Method attack. The second experiment uses the trained model to generate all of the elements of the proposed theoretical framework. The results are discussed and analyzed in hopes to develop a defence against adversarial attacks. Bachelor of Engineering (Computer Science) 2020-04-20T02:59:04Z 2020-04-20T02:59:04Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/137938 en SCSE19-0303 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Tan, Bryan Bing Xing
Protecting neural networks from adversarial attacks
description As modern technology is rapidly progressing, more applications are utilizing aspects of machine learning—especially deep learning to time-critical and real-world applications. Adversaries are coming up with new ways to exploit attack surfaces in the machine learning process, rendering systems and applications to be ineffective. By using carefully crafted adversarial examples, an adversary can cause even the most well-trained model to fail. Current defences implementations do not seem to cover all the attack surfaces, resulting in the need to understand the concepts behind adversarial attacks even further, to develop a strategic defence mechanism. In this project, a theoretical framework is proposed to research more on the concepts of adversarial examples, to develop a defence strategy against such attacks, for image recognition applications. Before coming up with this defence strategy, a study is conducted to understand the concepts of adversarial attacks. The study involves looking at how adversarial perturbations are generated given an image, and on what specific areas do these perturbations are formed. Also, a study is conducted to see what does a classifier look at when trying to classify an image. From the information gathered from studies, several methods are imposed to find these specific areas and present them on attention maps. Next, an attempt to find a spatial correlation between these areas is made. The idea is to see if any of these areas overlap with each other, and if so, can the area that is susceptible to adversarial perturbations be minimized or removed. There are two key experiments for this project. The first experiment involves analyzing the performance of classification for both pre and post attacked, using Fast Gradient Sign Method attack. The second experiment uses the trained model to generate all of the elements of the proposed theoretical framework. The results are discussed and analyzed in hopes to develop a defence against adversarial attacks.
author2 Anupam Chattopadhyay
author_facet Anupam Chattopadhyay
Tan, Bryan Bing Xing
format Final Year Project
author Tan, Bryan Bing Xing
author_sort Tan, Bryan Bing Xing
title Protecting neural networks from adversarial attacks
title_short Protecting neural networks from adversarial attacks
title_full Protecting neural networks from adversarial attacks
title_fullStr Protecting neural networks from adversarial attacks
title_full_unstemmed Protecting neural networks from adversarial attacks
title_sort protecting neural networks from adversarial attacks
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/137938
_version_ 1681059163823669248