Toward robust natural language systems
The monumental achievements of deep learning (DL) systems seem to guarantee the absolute superiority and robustness of modern DL systems, but they have shown significant vulnerability to samples specifically crafted to misguide them, namely adversarial examples The adversarial examples are seemingly...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/169803 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The monumental achievements of deep learning (DL) systems seem to guarantee the absolute superiority and robustness of modern DL systems, but they have shown significant vulnerability to samples specifically crafted to misguide them, namely adversarial examples The adversarial examples are seemingly indistinguishable from original inputs, but they are perturbed to cause misbehavior of the systems. This confronts us with challenging questions regarding their analysis and interpretation. In response, various approaches have been proposed, but the mathematical theory underlying deep learning models depicting the brittleness still remains obscure. Nonetheless, applications with deep learning models have become ubiquitous and continued to spread widely in various areas even at this moment. Providing its potential impacts on society, the ignorance towards this new learning paradigm would pose considerable threats to the safety and reliable operation of DL systems.
The brittleness is, however, not unique to a specific domain. Thus, an in-depth understanding of the potential risks of natural language processing (NLP) systems should become a leading research priority. With this aim in mind, we delve into the problem in a rigorous manner from different technical perspectives. We analyze existing defense schemes and find several crucial limitations such as instability of robustness improvements and dependency on attack obfuscation. We also investigate the robustness of language representations of pre-trained language models (PLMs) and their fine-tuned versions in a rigorous manner. The language representation analysis can be further extended to the transferability of adversarial examples. Thus, we also study their transferability across fine-tuned PLMs.
Subsequently, we aim to demonstrate threats of textual adversarial examples. We propose a simple parametric adversarial attack agent that learns a vicinity distribution that is sufficiently close to the original grammar/semantics of an input sequence but causes misbehavior of NLP systems in an end-to-end fashion. To train the attack agent, we also propose an optimization algorithm called Reinforced Momentum Update or RMU, which is designed to alleviate the conflict between successful adversarial attack and preservation of semantics/grammar, cast in the light of a multi-task optimization problem. Our extensive experiments demonstrate the effectiveness of the proposed attack scheme.
The next focus of this work is to develop a systemic framework for the seamless protection of NLP systems. To this end, we first introduce a novel textual adversarial example detection scheme. The proposed detection scheme leverages gradient signals to detect maliciously perturbed tokens in an input sequence and occludes such tokens by a masking process. It provides several advantages over existing methods including improved detection performance and an interpretation of its decision with an only moderate computational cost. Its approximated inference cost is no more than a single forward- and back-propagation through the target model without requiring any additional detection module. Extensive evaluations on widely adopted NLP benchmark datasets demonstrate the efficiency and effectiveness of the proposed method.
While our detection-based approach shows significant effectiveness, it does not guarantee a reliable operation of NLP systems as these do not cast light on our understanding of the underlying brittleness of such systems. For a reliable operation of NLP systems, we introduce RSMI, a novel two-stage framework that combines randomized smoothing (RS) with masked inference (MI) to improve the adversarial robustness of NLP systems. RS transforms a classifier into a smoothed classifier to obtain robust representations, whereas MI forces a model to exploit the surrounding context of a masked token in an input sequence. RSMI improves adversarial robustness by 2 to 3 times over existing state-of-the-art methods on benchmarking datasets. By empirically proving the stability of RSMI, we put it forward as a practical method to robustly train large-scale NLP models. |
---|