Text backdoor detection using an interpretable RNN abstract model

Deep neural networks (DNNs) are known to be inherently vulnerable to malicious attacks such as the adversarial attack and the backdoor attack. The former is crafted by adding small perturbations to benign inputs so as to fool a DNN. The latter generally embeds a hidden pattern in a DNN by poisoning...

Full description

Saved in:
Bibliographic Details
Main Authors: FAN, Ming, SI, Ziliang, XIE, Xiaofei, LIU, Yang, LIU, Ting
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
RNN
Online Access:https://ink.library.smu.edu.sg/sis_research/7118
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8121
record_format dspace
spelling sg-smu-ink.sis_research-81212022-04-14T06:42:03Z Text backdoor detection using an interpretable RNN abstract model FAN, Ming SI, Ziliang XIE, Xiaofei LIU, Yang LIU, Ting Deep neural networks (DNNs) are known to be inherently vulnerable to malicious attacks such as the adversarial attack and the backdoor attack. The former is crafted by adding small perturbations to benign inputs so as to fool a DNN. The latter generally embeds a hidden pattern in a DNN by poisoning the dataset during the training process, which causes the infected model to misbehave on predefined inputs with a specific trigger and normally perform for others. Much work has been conducted on defending against the adversarial samples, while the backdoor attack received much less attention, especially in recurrent neural networks (RNNs), which play an important role in the text processing field. Two main limitations make it hard to directly apply existing image backdoor detection approaches to RNN-based text classification systems. First, a layer in an RNN does not preserve the same feature latent space function for different inputs, making it impossible to map the inserted specific pattern with the neural activations. Second, the text data is inherently discrete, making it hard to optimize the text like image pixels. In this work, we propose a novel backdoor detection approach named InterRNN for RNN-based text classification systems from the interpretation perspective. Specifically, we first propose a novel RNN interpretation technique by constructing a nondeterministic finite automaton (NFA) based abstract model, which can effectively reduce the analysis complexity of an RNN while preserving its original logic rules. Then, based on the abstract model, we can obtain interpretation results that explain the fundamental reason behind the decision for each input. We then detect trigger words by leveraging the differences between the behaviors in the backdoor sentences and those in the normal sentences. The extensive experiment results on four benchmark datasets demonstrate that our approach can generate better interpretation results compared to state-of-the-art approaches and effectively detect backdoors in RNNs. 2021-01-01T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/7118 info:doi/10.1109/TIFS.2021.3103064 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Training Recurrent neural networks Task analysis Motion pictures Data models Analytical models Sentiment analysis Text backdoor detection RNN model abstraction interpretation OS and Networks Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Training
Recurrent neural networks
Task analysis
Motion pictures
Data models
Analytical models
Sentiment analysis
Text backdoor detection
RNN
model abstraction
interpretation
OS and Networks
Software Engineering
spellingShingle Training
Recurrent neural networks
Task analysis
Motion pictures
Data models
Analytical models
Sentiment analysis
Text backdoor detection
RNN
model abstraction
interpretation
OS and Networks
Software Engineering
FAN, Ming
SI, Ziliang
XIE, Xiaofei
LIU, Yang
LIU, Ting
Text backdoor detection using an interpretable RNN abstract model
description Deep neural networks (DNNs) are known to be inherently vulnerable to malicious attacks such as the adversarial attack and the backdoor attack. The former is crafted by adding small perturbations to benign inputs so as to fool a DNN. The latter generally embeds a hidden pattern in a DNN by poisoning the dataset during the training process, which causes the infected model to misbehave on predefined inputs with a specific trigger and normally perform for others. Much work has been conducted on defending against the adversarial samples, while the backdoor attack received much less attention, especially in recurrent neural networks (RNNs), which play an important role in the text processing field. Two main limitations make it hard to directly apply existing image backdoor detection approaches to RNN-based text classification systems. First, a layer in an RNN does not preserve the same feature latent space function for different inputs, making it impossible to map the inserted specific pattern with the neural activations. Second, the text data is inherently discrete, making it hard to optimize the text like image pixels. In this work, we propose a novel backdoor detection approach named InterRNN for RNN-based text classification systems from the interpretation perspective. Specifically, we first propose a novel RNN interpretation technique by constructing a nondeterministic finite automaton (NFA) based abstract model, which can effectively reduce the analysis complexity of an RNN while preserving its original logic rules. Then, based on the abstract model, we can obtain interpretation results that explain the fundamental reason behind the decision for each input. We then detect trigger words by leveraging the differences between the behaviors in the backdoor sentences and those in the normal sentences. The extensive experiment results on four benchmark datasets demonstrate that our approach can generate better interpretation results compared to state-of-the-art approaches and effectively detect backdoors in RNNs.
format text
author FAN, Ming
SI, Ziliang
XIE, Xiaofei
LIU, Yang
LIU, Ting
author_facet FAN, Ming
SI, Ziliang
XIE, Xiaofei
LIU, Yang
LIU, Ting
author_sort FAN, Ming
title Text backdoor detection using an interpretable RNN abstract model
title_short Text backdoor detection using an interpretable RNN abstract model
title_full Text backdoor detection using an interpretable RNN abstract model
title_fullStr Text backdoor detection using an interpretable RNN abstract model
title_full_unstemmed Text backdoor detection using an interpretable RNN abstract model
title_sort text backdoor detection using an interpretable rnn abstract model
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/7118
_version_ 1770576216172003328