Evaluation of adversarial attacks against deep learning models

The aim of the project is to evaluate and improve adversarial attacks against deep learning models. Specifically, this project will focus on attacks on deep learning models specific to image processing and classification and will only take references from defense methods. The definition of evaluat...

Full description

Saved in:
Bibliographic Details
Main Author: Lam, Sabrina Jing Wen
Other Authors: Zhang Tianwei
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165158
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The aim of the project is to evaluate and improve adversarial attacks against deep learning models. Specifically, this project will focus on attacks on deep learning models specific to image processing and classification and will only take references from defense methods. The definition of evaluation and improvement for this project would be to improve attack timings under the same computing resource and target model. A specific adversarial attack has been chosen in this project to focus on. Hence, understanding the various adversarial attacks on deep learning image classification models is essential in choosing which attack to focus on. The Projected Gradient Descent (PGD) attack was chosen for in-depth evaluation and improvement. The PGD attack explored in this project is a white box untargeted attack, specifically focusing on ∞ bound. Two datasets have also been used, the ImageNet dataset and CIFAR-10 dataset. Methods in order to improve effectiveness of the PGD attack were designed and attempted with the aim of improving attack timings. This is done in the form of attack variants. Four PGD variants have been crafted to test and improve its effectiveness against the standard PGD attacks and will be compared in two areas, attack timing and attack success rate (ASR). These four variants centre around the concept of randomness and decay in step sizes as well as the surrogate loss function of the PGD attacks. In total, six PGD attacks have been crafted, four variants and two standard PGD attacks for comparison. These six PGD attacks are then implemented on the two different datasets mentioned above, ImageNet and CIFAR-10. Hence, a total of twelve attacks were designed and implemented. The experimental results show that the PGD variants do improve effectiveness compared to standard PGD attacks. However, the variants are only effective in one area out of the two, either more effective in terms of attack timing or ASR. Brief improvements also have been attempted on the variants. Since the improvements attempted are brief, more research has to be done to successfully improve the variants. The experimental results also show that the properties of the variants along with using different surrogate loss functions produce different results when implemented on the different datasets. Overall, this project has proven that the concept of randomness in step size and concept of decay in step size are important factors of improving effectiveness of PGD attack. Additionally, this project has also proven that certain surrogate loss function is more suited and improves effectiveness of PGD attacks better for certain datasets. Definitely, this project opens opportunities for future work.