Towards robust explainability of deep neural networks against attribution attacks

Deep learning techniques have been rapidly developed and widely applied in various fields. However, the black-box nature of deep neural networks (DNNs) makes it difficult to understand their decision-making process, giving rise to the field of explainable artificial intelligence (XAI). Attribution m...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Fan
Other Authors: Kong Wai-Kin, Adams
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175394
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Deep learning techniques have been rapidly developed and widely applied in various fields. However, the black-box nature of deep neural networks (DNNs) makes it difficult to understand their decision-making process, giving rise to the field of explainable artificial intelligence (XAI). Attribution methods are one of the most popular XAI methods, aiming to explain the DNN's prediction by attributing it to the input features. Unfortunately, these attribution methods are vulnerable to adversarial attacks, which can mislead the attribution results. To address this problem, this thesis attempts to develop attribution protection methods to defend against adversarial attacks, using both empirical and theoretical approaches. The empirical approaches are developed to improve attribution robustness, and the theoretical approaches are proposed to understand the worst-case attribution deviations after the inputs are perturbed. The effectiveness of the proposed methods is studied with rigorous analysis and proofs, and the performance of the proposed methods is validated on various datasets and different types of attacks, compared with state-of-the-art methods.