Attack as detection: Using adversarial attack methods to detect abnormal examples

As a new programming paradigm, deep learning (DL) has achieved impressive performance in areas such as image processing and speech recognition, and has expanded its application to solve many real-world problems. However, neural networks and DL are normally black-box systems; even worse, DL-based sof...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHAO, Zhe, CHEN, Guangke, LIU, Tong, LI, Taishan, SONG, Fu, WANG, Jingyi, SUN, Jun
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/9212 https://ink.library.smu.edu.sg/context/sis_research/article/10218/viewcontent/TOSEM23.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10218
record_format	dspace
spelling	sg-smu-ink.sis_research-102182024-08-15T07:48:35Z Attack as detection: Using adversarial attack methods to detect abnormal examples ZHAO, Zhe CHEN, Guangke LIU, Tong LI, Taishan SONG, Fu WANG, Jingyi SUN, Jun As a new programming paradigm, deep learning (DL) has achieved impressive performance in areas such as image processing and speech recognition, and has expanded its application to solve many real-world problems. However, neural networks and DL are normally black-box systems; even worse, DL-based software are vulnerable to threats from abnormal examples, such as adversarial and backdoored examples constructed by attackers with malicious intentions as well as unintentionally mislabeled samples. Therefore, it is important and urgent to detect such abnormal examples. Although various detection approaches have been proposed respectively addressing some specific types of abnormal examples, they suffer from some limitations; until today, this problem is still of considerable interest. In this work, we first propose a novel characterization to distinguish abnormal examples from normal ones based on the observation that abnormal examples have significantly different (adversarial) robustness from normal ones. We systemically analyze those three different types of abnormal samples in terms of robustness and find that they have different characteristics from normal ones. As robustness measurement is computationally expensive and hence can be challenging to scale to large networks, we then propose to effectively and efficiently measure robustness of an input sample using the cost of adversarially attacking the input, which was originally proposed to test robustness of neural networks against adversarial examples. Next, we propose a novel detection method, named attack as detection (A2D for short), which uses the cost of adversarially attacking an input instead of robustness to check if it is abnormal. Our detection method is generic, and various adversarial attack methods could be leveraged. Extensive experiments show that A2D is more effective than recent promising approaches that were proposed to detect only one specific type of abnormal examples. We also thoroughly discuss possible adaptive attack methods to our adversarial example detection method and show that A2D is still effective in defending carefully designed adaptive adversarial attack methods—for example, the attack success rate drops to 0% on CIFAR10. 2024-03-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9212 info:doi/10.1145/3631977 https://ink.library.smu.edu.sg/context/sis_research/article/10218/viewcontent/TOSEM23.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Software Engineering
spellingShingle	Software Engineering ZHAO, Zhe CHEN, Guangke LIU, Tong LI, Taishan SONG, Fu WANG, Jingyi SUN, Jun Attack as detection: Using adversarial attack methods to detect abnormal examples
description	As a new programming paradigm, deep learning (DL) has achieved impressive performance in areas such as image processing and speech recognition, and has expanded its application to solve many real-world problems. However, neural networks and DL are normally black-box systems; even worse, DL-based software are vulnerable to threats from abnormal examples, such as adversarial and backdoored examples constructed by attackers with malicious intentions as well as unintentionally mislabeled samples. Therefore, it is important and urgent to detect such abnormal examples. Although various detection approaches have been proposed respectively addressing some specific types of abnormal examples, they suffer from some limitations; until today, this problem is still of considerable interest. In this work, we first propose a novel characterization to distinguish abnormal examples from normal ones based on the observation that abnormal examples have significantly different (adversarial) robustness from normal ones. We systemically analyze those three different types of abnormal samples in terms of robustness and find that they have different characteristics from normal ones. As robustness measurement is computationally expensive and hence can be challenging to scale to large networks, we then propose to effectively and efficiently measure robustness of an input sample using the cost of adversarially attacking the input, which was originally proposed to test robustness of neural networks against adversarial examples. Next, we propose a novel detection method, named attack as detection (A2D for short), which uses the cost of adversarially attacking an input instead of robustness to check if it is abnormal. Our detection method is generic, and various adversarial attack methods could be leveraged. Extensive experiments show that A2D is more effective than recent promising approaches that were proposed to detect only one specific type of abnormal examples. We also thoroughly discuss possible adaptive attack methods to our adversarial example detection method and show that A2D is still effective in defending carefully designed adaptive adversarial attack methods—for example, the attack success rate drops to 0% on CIFAR10.
format	text
author	ZHAO, Zhe CHEN, Guangke LIU, Tong LI, Taishan SONG, Fu WANG, Jingyi SUN, Jun
author_facet	ZHAO, Zhe CHEN, Guangke LIU, Tong LI, Taishan SONG, Fu WANG, Jingyi SUN, Jun
author_sort	ZHAO, Zhe
title	Attack as detection: Using adversarial attack methods to detect abnormal examples
title_short	Attack as detection: Using adversarial attack methods to detect abnormal examples
title_full	Attack as detection: Using adversarial attack methods to detect abnormal examples
title_fullStr	Attack as detection: Using adversarial attack methods to detect abnormal examples
title_full_unstemmed	Attack as detection: Using adversarial attack methods to detect abnormal examples
title_sort	attack as detection: using adversarial attack methods to detect abnormal examples
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9212 https://ink.library.smu.edu.sg/context/sis_research/article/10218/viewcontent/TOSEM23.pdf
_version_	1814047792696393728

Attack as detection: Using adversarial attack methods to detect abnormal examples

Similar Items