Robustness and cross-lingual transfer: An exploration of out-of-distribution scenario in natural language processing
Most traditional machine learning or deep learning methods are based on the premise that training data and test data are independent and identical distributed, i.e., IID. However, it is just an ideal situation. In real-world applications, test set and training data often follow different distributio...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2022
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/etd_coll/446 https://ink.library.smu.edu.sg/context/etd_coll/article/1444/viewcontent/GPIS_AY2018_PhD_SichengYu.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Summary: | Most traditional machine learning or deep learning methods are based on the premise that training data and test data are independent and identical distributed, i.e., IID. However, it is just an ideal situation. In real-world applications, test set and training data often follow different distributions, which we refer to as the out of distribution, i.e., OOD, setting. As a result, models trained with traditional methods always suffer from an undesirable performance drop on the OOD test set. It's necessary to develop techniques to solve this problem for real applications. In this dissertation, we present four pieces of work in the direction of OOD in Natural Language Processing (NLP) which can be further grouped into two sub-categories: adversarial robustness and cross-lingual transfer.
For the sub-category of adversarial robustness, the two work are summarized as follows:
We target at the question answering task. Question answering aims to find the answer given a passage, a question and possibly a set of options. Oftentimes question answering models over rely on some shortcut patterns, e.g., word alignment between question and passage, instead of robust reasoning. Therefore, standard question answering models may fail on adversarial OOD sets where the shortcut fails to work. To this end, we analyze the shortcut in question answering task with the help of causal graphs and propose a counterfactual variable control method to mitigate the problem. The experiment results on different adversarial OOD sets show that our method improves the robustness and interpretability of question answering models.
We explore the model debiasing in the scenario of unknown bias where there is no prior knowledge about the bias for natural language understanding tasks. From the causal perspective, vulnerability in deep models is caused by the confounder, e.g., the natural bias. A general method in causal inference for deconfounding is intervention.We propose an automatic and multi-granular intervention method for debiasing the natural language understanding models. With the help of the it, we achieve new state-of-the-art performance on three tasks under their OOD settings.
For the sub-category of cross-lingual transfer, the two work are summarized as follows:
We investigate the zero-shot and few-shot cross-lingual understanding tasks where the model is only trained with English data (zero-shot) and very few target language data (few-shot), then we directly apply the model on the target language which is OOD compared to the training data. We propose a counterfactual syntax method which injects the universal syntax into the model and further enforces the model to focus on the syntax information to assist the cross-lingual transfer. Such enriched and utilized syntax information helps the model to attain state-of-the-art performance on three cross-lingual understanding benchmarks.
We focus on the issue of translationese artifacts in translate-train method for cross-lingual transfer where we use the translated text of the target language for data augmentation. Although it introduces data of target language in training, it also brings the gap between originals and translationese. We propose an approach to mitigate the gap on the source language and apply it on target languages. The results demonstrate that our approach outperforms several strong baselines. |
---|