Evaluate the effects of model post-processing on adversarial examples

In recent years, deep learning models, particularly Convolutional Neural Networks (CNN), have achieved remarkable success in a variety of applications such as image recognition, object detection, and autonomous systems. Despite CNNs’ ability to solve a plethora of problems in the field of Computer V...

Full description

Saved in:
Bibliographic Details
Main Author: Low, Gerald
Other Authors: Chang Chip Hong
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/167310
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In recent years, deep learning models, particularly Convolutional Neural Networks (CNN), have achieved remarkable success in a variety of applications such as image recognition, object detection, and autonomous systems. Despite CNNs’ ability to solve a plethora of problems in the field of Computer Vision, they are not immune to adversarial attacks, which can significantly affect the model's performance. Adversarial attacks are malicious perturbations that are added to the input data to delude the model into wrong decision making. These attacks can be introduced by slightly modifying the input, resulting in adversarial samples that may be imperceptible to human eyes, but can still drastically change the model's prediction. Several methods have been proposed to improve the robustness of deep learning models against adversarial attacks, such as using additional models that can detect adversarial samples as a network add-on to the original model, or changing the original model’s architecture by adding more layers. However, these methods can be computationally demanding and have a higher training cost. Therefore, there is a need for more defense strategies that incur less training cost but can still increase the model’s robustness. In this project, we focus on post-processing methods and investigate their effectiveness in improving the model's robustness against two types of adversarial attacks: Fast Gradient Sign Method (FGSM) and Carlini and Wagner (C&W) attacks. The various post-processing techniques implemented in this project are fine-tuning, weight pruning, and filter pruning. By comparing the success rate of adversarial attacks on post-processed model to that of the original model, we can evaluate the effectiveness of those techniques in mitigating adversarial attacks.