Neural network semantic backdoor detection and mitigation: A causality-based approach

Different from ordinary backdoors in neural networks which are introduced with artificial triggers (e.g., certain specific patch) and/or by tampering the samples, semantic backdoors are introduced by simply manipulating the semantic, e.g., by labeling green cars as frogs in the training set. By focu...

Full description

Saved in:

Bibliographic Details
Main Authors:	SUN, Bing, SUN, Jun, KOH, Wayne, SHI, Jie
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	OS and Networks Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/9211 https://ink.library.smu.edu.sg/context/sis_research/article/10217/viewcontent/sec23winter_prepub_118_sun.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10217
record_format	dspace
spelling	sg-smu-ink.sis_research-102172024-08-15T07:49:01Z Neural network semantic backdoor detection and mitigation: A causality-based approach SUN, Bing SUN, Jun KOH, Wayne SHI, Jie Different from ordinary backdoors in neural networks which are introduced with artificial triggers (e.g., certain specific patch) and/or by tampering the samples, semantic backdoors are introduced by simply manipulating the semantic, e.g., by labeling green cars as frogs in the training set. By focusing on samples with rare semantic features (such as green cars), the accuracy of the model is often minimally affected. Since the attacker is not required to modify the input sample during training nor inference time, semantic backdoors are challenging to detect and remove. Existing backdoor detection and mitigation techniques are shown to be ineffective with respect to semantic backdoors. In this work, we propose a method to systematically detect and remove semantic backdoors. Specifically we propose SODA (Semantic BackdOor Detection and MitigAtion) with the key idea of conducting lightweight causality analysis to identify potential semantic backdoor based on how hidden neurons contribute to the predictions and to remove the backdoor by adjusting the responsible neurons’ contribution towards the correct predictions through optimization. SODA is evaluated with 21 neural networks trained on 6 benchmark datasets and 2 kinds of semantic backdoor attacks for each dataset. The results show that it effectively detects and removes semantic backdoors and preserves the accuracy of the neural networks. 2024-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9211 https://ink.library.smu.edu.sg/context/sis_research/article/10217/viewcontent/sec23winter_prepub_118_sun.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University OS and Networks Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	OS and Networks Software Engineering
spellingShingle	OS and Networks Software Engineering SUN, Bing SUN, Jun KOH, Wayne SHI, Jie Neural network semantic backdoor detection and mitigation: A causality-based approach
description	Different from ordinary backdoors in neural networks which are introduced with artificial triggers (e.g., certain specific patch) and/or by tampering the samples, semantic backdoors are introduced by simply manipulating the semantic, e.g., by labeling green cars as frogs in the training set. By focusing on samples with rare semantic features (such as green cars), the accuracy of the model is often minimally affected. Since the attacker is not required to modify the input sample during training nor inference time, semantic backdoors are challenging to detect and remove. Existing backdoor detection and mitigation techniques are shown to be ineffective with respect to semantic backdoors. In this work, we propose a method to systematically detect and remove semantic backdoors. Specifically we propose SODA (Semantic BackdOor Detection and MitigAtion) with the key idea of conducting lightweight causality analysis to identify potential semantic backdoor based on how hidden neurons contribute to the predictions and to remove the backdoor by adjusting the responsible neurons’ contribution towards the correct predictions through optimization. SODA is evaluated with 21 neural networks trained on 6 benchmark datasets and 2 kinds of semantic backdoor attacks for each dataset. The results show that it effectively detects and removes semantic backdoors and preserves the accuracy of the neural networks.
format	text
author	SUN, Bing SUN, Jun KOH, Wayne SHI, Jie
author_facet	SUN, Bing SUN, Jun KOH, Wayne SHI, Jie
author_sort	SUN, Bing
title	Neural network semantic backdoor detection and mitigation: A causality-based approach
title_short	Neural network semantic backdoor detection and mitigation: A causality-based approach
title_full	Neural network semantic backdoor detection and mitigation: A causality-based approach
title_fullStr	Neural network semantic backdoor detection and mitigation: A causality-based approach
title_full_unstemmed	Neural network semantic backdoor detection and mitigation: A causality-based approach
title_sort	neural network semantic backdoor detection and mitigation: a causality-based approach
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9211 https://ink.library.smu.edu.sg/context/sis_research/article/10217/viewcontent/sec23winter_prepub_118_sun.pdf
_version_	1814047792424812544

Neural network semantic backdoor detection and mitigation: A causality-based approach

Similar Items