Toward robust natural language systems

The monumental achievements of deep learning (DL) systems seem to guarantee the absolute superiority and robustness of modern DL systems, but they have shown significant vulnerability to samples specifically crafted to misguide them, namely adversarial examples The adversarial examples are seemingly...

Full description

Saved in:
Bibliographic Details
Main Author: Moon, Han Cheol
Other Authors: Joty Shafiq Rayhan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/169803
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-169803
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Moon, Han Cheol
Toward robust natural language systems
description The monumental achievements of deep learning (DL) systems seem to guarantee the absolute superiority and robustness of modern DL systems, but they have shown significant vulnerability to samples specifically crafted to misguide them, namely adversarial examples The adversarial examples are seemingly indistinguishable from original inputs, but they are perturbed to cause misbehavior of the systems. This confronts us with challenging questions regarding their analysis and interpretation. In response, various approaches have been proposed, but the mathematical theory underlying deep learning models depicting the brittleness still remains obscure. Nonetheless, applications with deep learning models have become ubiquitous and continued to spread widely in various areas even at this moment. Providing its potential impacts on society, the ignorance towards this new learning paradigm would pose considerable threats to the safety and reliable operation of DL systems. The brittleness is, however, not unique to a specific domain. Thus, an in-depth understanding of the potential risks of natural language processing (NLP) systems should become a leading research priority. With this aim in mind, we delve into the problem in a rigorous manner from different technical perspectives. We analyze existing defense schemes and find several crucial limitations such as instability of robustness improvements and dependency on attack obfuscation. We also investigate the robustness of language representations of pre-trained language models (PLMs) and their fine-tuned versions in a rigorous manner. The language representation analysis can be further extended to the transferability of adversarial examples. Thus, we also study their transferability across fine-tuned PLMs. Subsequently, we aim to demonstrate threats of textual adversarial examples. We propose a simple parametric adversarial attack agent that learns a vicinity distribution that is sufficiently close to the original grammar/semantics of an input sequence but causes misbehavior of NLP systems in an end-to-end fashion. To train the attack agent, we also propose an optimization algorithm called Reinforced Momentum Update or RMU, which is designed to alleviate the conflict between successful adversarial attack and preservation of semantics/grammar, cast in the light of a multi-task optimization problem. Our extensive experiments demonstrate the effectiveness of the proposed attack scheme. The next focus of this work is to develop a systemic framework for the seamless protection of NLP systems. To this end, we first introduce a novel textual adversarial example detection scheme. The proposed detection scheme leverages gradient signals to detect maliciously perturbed tokens in an input sequence and occludes such tokens by a masking process. It provides several advantages over existing methods including improved detection performance and an interpretation of its decision with an only moderate computational cost. Its approximated inference cost is no more than a single forward- and back-propagation through the target model without requiring any additional detection module. Extensive evaluations on widely adopted NLP benchmark datasets demonstrate the efficiency and effectiveness of the proposed method. While our detection-based approach shows significant effectiveness, it does not guarantee a reliable operation of NLP systems as these do not cast light on our understanding of the underlying brittleness of such systems. For a reliable operation of NLP systems, we introduce RSMI, a novel two-stage framework that combines randomized smoothing (RS) with masked inference (MI) to improve the adversarial robustness of NLP systems. RS transforms a classifier into a smoothed classifier to obtain robust representations, whereas MI forces a model to exploit the surrounding context of a masked token in an input sequence. RSMI improves adversarial robustness by 2 to 3 times over existing state-of-the-art methods on benchmarking datasets. By empirically proving the stability of RSMI, we put it forward as a practical method to robustly train large-scale NLP models.
author2 Joty Shafiq Rayhan
author_facet Joty Shafiq Rayhan
Moon, Han Cheol
format Thesis-Doctor of Philosophy
author Moon, Han Cheol
author_sort Moon, Han Cheol
title Toward robust natural language systems
title_short Toward robust natural language systems
title_full Toward robust natural language systems
title_fullStr Toward robust natural language systems
title_full_unstemmed Toward robust natural language systems
title_sort toward robust natural language systems
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/169803
_version_ 1779156320329400320
spelling sg-ntu-dr.10356-1698032023-09-04T07:32:08Z Toward robust natural language systems Moon, Han Cheol Joty Shafiq Rayhan School of Computer Science and Engineering srjoty@ntu.edu.sg Engineering::Computer science and engineering The monumental achievements of deep learning (DL) systems seem to guarantee the absolute superiority and robustness of modern DL systems, but they have shown significant vulnerability to samples specifically crafted to misguide them, namely adversarial examples The adversarial examples are seemingly indistinguishable from original inputs, but they are perturbed to cause misbehavior of the systems. This confronts us with challenging questions regarding their analysis and interpretation. In response, various approaches have been proposed, but the mathematical theory underlying deep learning models depicting the brittleness still remains obscure. Nonetheless, applications with deep learning models have become ubiquitous and continued to spread widely in various areas even at this moment. Providing its potential impacts on society, the ignorance towards this new learning paradigm would pose considerable threats to the safety and reliable operation of DL systems. The brittleness is, however, not unique to a specific domain. Thus, an in-depth understanding of the potential risks of natural language processing (NLP) systems should become a leading research priority. With this aim in mind, we delve into the problem in a rigorous manner from different technical perspectives. We analyze existing defense schemes and find several crucial limitations such as instability of robustness improvements and dependency on attack obfuscation. We also investigate the robustness of language representations of pre-trained language models (PLMs) and their fine-tuned versions in a rigorous manner. The language representation analysis can be further extended to the transferability of adversarial examples. Thus, we also study their transferability across fine-tuned PLMs. Subsequently, we aim to demonstrate threats of textual adversarial examples. We propose a simple parametric adversarial attack agent that learns a vicinity distribution that is sufficiently close to the original grammar/semantics of an input sequence but causes misbehavior of NLP systems in an end-to-end fashion. To train the attack agent, we also propose an optimization algorithm called Reinforced Momentum Update or RMU, which is designed to alleviate the conflict between successful adversarial attack and preservation of semantics/grammar, cast in the light of a multi-task optimization problem. Our extensive experiments demonstrate the effectiveness of the proposed attack scheme. The next focus of this work is to develop a systemic framework for the seamless protection of NLP systems. To this end, we first introduce a novel textual adversarial example detection scheme. The proposed detection scheme leverages gradient signals to detect maliciously perturbed tokens in an input sequence and occludes such tokens by a masking process. It provides several advantages over existing methods including improved detection performance and an interpretation of its decision with an only moderate computational cost. Its approximated inference cost is no more than a single forward- and back-propagation through the target model without requiring any additional detection module. Extensive evaluations on widely adopted NLP benchmark datasets demonstrate the efficiency and effectiveness of the proposed method. While our detection-based approach shows significant effectiveness, it does not guarantee a reliable operation of NLP systems as these do not cast light on our understanding of the underlying brittleness of such systems. For a reliable operation of NLP systems, we introduce RSMI, a novel two-stage framework that combines randomized smoothing (RS) with masked inference (MI) to improve the adversarial robustness of NLP systems. RS transforms a classifier into a smoothed classifier to obtain robust representations, whereas MI forces a model to exploit the surrounding context of a masked token in an input sequence. RSMI improves adversarial robustness by 2 to 3 times over existing state-of-the-art methods on benchmarking datasets. By empirically proving the stability of RSMI, we put it forward as a practical method to robustly train large-scale NLP models. Doctor of Philosophy 2023-08-07T02:17:26Z 2023-08-07T02:17:26Z 2023 Thesis-Doctor of Philosophy Moon, H. C. (2023). Toward robust natural language systems. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/169803 https://hdl.handle.net/10356/169803 10.32657/10356/169803 en M21J6a0080 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University