Toward robust natural language systems

The monumental achievements of deep learning (DL) systems seem to guarantee the absolute superiority and robustness of modern DL systems, but they have shown significant vulnerability to samples specifically crafted to misguide them, namely adversarial examples The adversarial examples are seemingly...

Full description

Saved in:

Bibliographic Details
Main Author:	Moon, Han Cheol
Other Authors:	Joty Shafiq Rayhan
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/169803
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-169803
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Moon, Han Cheol Toward robust natural language systems
description	The monumental achievements of deep learning (DL) systems seem to guarantee the absolute superiority and robustness of modern DL systems, but they have shown significant vulnerability to samples specifically crafted to misguide them, namely adversarial examples The adversarial examples are seemingly indistinguishable from original inputs, but they are perturbed to cause misbehavior of the systems. This confronts us with challenging questions regarding their analysis and interpretation. In response, various approaches have been proposed, but the mathematical theory underlying deep learning models depicting the brittleness still remains obscure. Nonetheless, applications with deep learning models have become ubiquitous and continued to spread widely in various areas even at this moment. Providing its potential impacts on society, the ignorance towards this new learning paradigm would pose considerable threats to the safety and reliable operation of DL systems. The brittleness is, however, not unique to a specific domain. Thus, an in-depth understanding of the potential risks of natural language processing (NLP) systems should become a leading research priority. With this aim in mind, we delve into the problem in a rigorous manner from different technical perspectives. We analyze existing defense schemes and find several crucial limitations such as instability of robustness improvements and dependency on attack obfuscation. We also investigate the robustness of language representations of pre-trained language models (PLMs) and their fine-tuned versions in a rigorous manner. The language representation analysis can be further extended to the transferability of adversarial examples. Thus, we also study their transferability across fine-tuned PLMs. Subsequently, we aim to demonstrate threats of textual adversarial examples. We propose a simple parametric adversarial attack agent that learns a vicinity distribution that is sufficiently close to the original grammar/semantics of an input sequence but causes misbehavior of NLP systems in an end-to-end fashion. To train the attack agent, we also propose an optimization algorithm called Reinforced Momentum Update or RMU, which is designed to alleviate the conflict between successful adversarial attack and preservation of semantics/grammar, cast in the light of a multi-task optimization problem. Our extensive experiments demonstrate the effectiveness of the proposed attack scheme. The next focus of this work is to develop a systemic framework for the seamless protection of NLP systems. To this end, we first introduce a novel textual adversarial example detection scheme. The proposed detection scheme leverages gradient signals to detect maliciously perturbed tokens in an input sequence and occludes such tokens by a masking process. It provides several advantages over existing methods including improved detection performance and an interpretation of its decision with an only moderate computational cost. Its approximated inference cost is no more than a single forward- and back-propagation through the target model without requiring any additional detection module. Extensive evaluations on widely adopted NLP benchmark datasets demonstrate the efficiency and effectiveness of the proposed method. While our detection-based approach shows significant effectiveness, it does not guarantee a reliable operation of NLP systems as these do not cast light on our understanding of the underlying brittleness of such systems. For a reliable operation of NLP systems, we introduce RSMI, a novel two-stage framework that combines randomized smoothing (RS) with masked inference (MI) to improve the adversarial robustness of NLP systems. RS transforms a classifier into a smoothed classifier to obtain robust representations, whereas MI forces a model to exploit the surrounding context of a masked token in an input sequence. RSMI improves adversarial robustness by 2 to 3 times over existing state-of-the-art methods on benchmarking datasets. By empirically proving the stability of RSMI, we put it forward as a practical method to robustly train large-scale NLP models.
author2	Joty Shafiq Rayhan
author_facet	Joty Shafiq Rayhan Moon, Han Cheol
format	Thesis-Doctor of Philosophy
author	Moon, Han Cheol
author_sort	Moon, Han Cheol
title	Toward robust natural language systems
title_short	Toward robust natural language systems
title_full	Toward robust natural language systems
title_fullStr	Toward robust natural language systems
title_full_unstemmed	Toward robust natural language systems
title_sort	toward robust natural language systems
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/169803
_version_	1779156320329400320
spelling	sg-ntu-dr.10356-1698032023-09-04T07:32:08Z Toward robust natural language systems Moon, Han Cheol Joty Shafiq Rayhan School of Computer Science and Engineering srjoty@ntu.edu.sg Engineering::Computer science and engineering The monumental achievements of deep learning (DL) systems seem to guarantee the absolute superiority and robustness of modern DL systems, but they have shown significant vulnerability to samples specifically crafted to misguide them, namely adversarial examples The adversarial examples are seemingly indistinguishable from original inputs, but they are perturbed to cause misbehavior of the systems. This confronts us with challenging questions regarding their analysis and interpretation. In response, various approaches have been proposed, but the mathematical theory underlying deep learning models depicting the brittleness still remains obscure. Nonetheless, applications with deep learning models have become ubiquitous and continued to spread widely in various areas even at this moment. Providing its potential impacts on society, the ignorance towards this new learning paradigm would pose considerable threats to the safety and reliable operation of DL systems. The brittleness is, however, not unique to a specific domain. Thus, an in-depth understanding of the potential risks of natural language processing (NLP) systems should become a leading research priority. With this aim in mind, we delve into the problem in a rigorous manner from different technical perspectives. We analyze existing defense schemes and find several crucial limitations such as instability of robustness improvements and dependency on attack obfuscation. We also investigate the robustness of language representations of pre-trained language models (PLMs) and their fine-tuned versions in a rigorous manner. The language representation analysis can be further extended to the transferability of adversarial examples. Thus, we also study their transferability across fine-tuned PLMs. Subsequently, we aim to demonstrate threats of textual adversarial examples. We propose a simple parametric adversarial attack agent that learns a vicinity distribution that is sufficiently close to the original grammar/semantics of an input sequence but causes misbehavior of NLP systems in an end-to-end fashion. To train the attack agent, we also propose an optimization algorithm called Reinforced Momentum Update or RMU, which is designed to alleviate the conflict between successful adversarial attack and preservation of semantics/grammar, cast in the light of a multi-task optimization problem. Our extensive experiments demonstrate the effectiveness of the proposed attack scheme. The next focus of this work is to develop a systemic framework for the seamless protection of NLP systems. To this end, we first introduce a novel textual adversarial example detection scheme. The proposed detection scheme leverages gradient signals to detect maliciously perturbed tokens in an input sequence and occludes such tokens by a masking process. It provides several advantages over existing methods including improved detection performance and an interpretation of its decision with an only moderate computational cost. Its approximated inference cost is no more than a single forward- and back-propagation through the target model without requiring any additional detection module. Extensive evaluations on widely adopted NLP benchmark datasets demonstrate the efficiency and effectiveness of the proposed method. While our detection-based approach shows significant effectiveness, it does not guarantee a reliable operation of NLP systems as these do not cast light on our understanding of the underlying brittleness of such systems. For a reliable operation of NLP systems, we introduce RSMI, a novel two-stage framework that combines randomized smoothing (RS) with masked inference (MI) to improve the adversarial robustness of NLP systems. RS transforms a classifier into a smoothed classifier to obtain robust representations, whereas MI forces a model to exploit the surrounding context of a masked token in an input sequence. RSMI improves adversarial robustness by 2 to 3 times over existing state-of-the-art methods on benchmarking datasets. By empirically proving the stability of RSMI, we put it forward as a practical method to robustly train large-scale NLP models. Doctor of Philosophy 2023-08-07T02:17:26Z 2023-08-07T02:17:26Z 2023 Thesis-Doctor of Philosophy Moon, H. C. (2023). Toward robust natural language systems. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/169803 https://hdl.handle.net/10356/169803 10.32657/10356/169803 en M21J6a0080 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Toward robust natural language systems

Similar Items