Data-efficient domain adaptation for pretrained language models

Recent advances in Natural Language Processing (NLP) are built on a range of large-scale pretrained language models (PLMs), which are based on deep transformer neural networks. These PLMs simultaneously learn contextualized word representations and language modeling by training the entire model on m...

Full description

Saved in:
Bibliographic Details
Main Author: Guo, Xu
Other Authors: Yu Han
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/167965
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-167965
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Guo, Xu
Data-efficient domain adaptation for pretrained language models
description Recent advances in Natural Language Processing (NLP) are built on a range of large-scale pretrained language models (PLMs), which are based on deep transformer neural networks. These PLMs simultaneously learn contextualized word representations and language modeling by training the entire model on massive unlabeled corpora using self-supervised learning techniques, bringing about a paradigm shift that moves our focus from customizing different models for different tasks to adapting one PLM to all tasks. Studying how to adapt a general-purpose PLM to a specific domain of interest is of great significance to the deployment of PLMs. The mainstream practice is to finetune a PLM with a task-specific head on a labeled dataset from the target domain. However, for most target applications, labeled data is limited and even scarce in many low-resource scenarios. The huge number of parameters in a PLM often leaves those small datasets struggling to harness the power of the language priors. As a result, even under the same task, when a PLM finetuned on one dataset is applied to another dataset with some domain gap, it sometimes encounters performance degradation due to overfitting the previous training set. This phenomenon hinders the wide adoption of PLMs in practice, particularly in the face of new domains, calling for approaches to enhance the generalization performance of PLMs during adaptation without requesting more labeled data. Early domain adaptation methods, which leverage similar source domains to boost model performance on the target domains, are developed based on customized models using traditional neural networks such as LSTMs. These models are shallow, require longer training time to converge, and have no prior knowledge compared to PLMs. Studies show that some popular domain adaptation methods can even harm the generalization performance of PLMs on the target domains. The unique characteristics of PLMs such as unprecedented scales, rich language priors, and many hitherto underexplored skills could be uncontrollable factors that make them exhibit different learning behaviors compared to traditional models. To this end, there is a need to develop algorithms for PLMs to enhance their domain adaptation performance, thereby accelerating their wide adoption in real-world scenarios. This thesis aims to explore techniques that can efficiently make use of the target domain labeled data and better adapt a given PLM to the target domains of interest by effectively transferring knowledge from similar source domains to the target domains. To achieve this goal, I conduct research from three perspectives throughout a machine learning pipeline, each assuming only specified locations can be updated with available computing resources. That is, we keep all other conditions fixed and only make updates to the input data, model representations, and output predictions respectively. We show how to achieve better generalization performance with limited labeled data from the target domains under each scenario. To sum up, we propose a new algorithm to generate adversarial perturbations using the domain adaptation objective to enhance the transferability of soft prompt tuning in low-resource scenarios, a new model optimization algorithm that takes into account the next-step gradients of adversarial domain discriminator when optimizing the task classifiers to accommodate competing losses and a new federated learning framework that calibrates the conditional probability distribution to adapt the same PLM to multiple domains under different label distributions. We present the specific problems, related works, detailed methods, extensive experiments, and thorough discussions in the following chapters, and shed light on how to base on traditional machine learning methods while catering to newly emerging learning paradigms.
author2 Yu Han
author_facet Yu Han
Guo, Xu
format Thesis-Doctor of Philosophy
author Guo, Xu
author_sort Guo, Xu
title Data-efficient domain adaptation for pretrained language models
title_short Data-efficient domain adaptation for pretrained language models
title_full Data-efficient domain adaptation for pretrained language models
title_fullStr Data-efficient domain adaptation for pretrained language models
title_full_unstemmed Data-efficient domain adaptation for pretrained language models
title_sort data-efficient domain adaptation for pretrained language models
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/167965
_version_ 1772827944417230848
spelling sg-ntu-dr.10356-1679652023-06-01T08:00:48Z Data-efficient domain adaptation for pretrained language models Guo, Xu Yu Han School of Computer Science and Engineering han.yu@ntu.edu.sg Engineering::Computer science and engineering Recent advances in Natural Language Processing (NLP) are built on a range of large-scale pretrained language models (PLMs), which are based on deep transformer neural networks. These PLMs simultaneously learn contextualized word representations and language modeling by training the entire model on massive unlabeled corpora using self-supervised learning techniques, bringing about a paradigm shift that moves our focus from customizing different models for different tasks to adapting one PLM to all tasks. Studying how to adapt a general-purpose PLM to a specific domain of interest is of great significance to the deployment of PLMs. The mainstream practice is to finetune a PLM with a task-specific head on a labeled dataset from the target domain. However, for most target applications, labeled data is limited and even scarce in many low-resource scenarios. The huge number of parameters in a PLM often leaves those small datasets struggling to harness the power of the language priors. As a result, even under the same task, when a PLM finetuned on one dataset is applied to another dataset with some domain gap, it sometimes encounters performance degradation due to overfitting the previous training set. This phenomenon hinders the wide adoption of PLMs in practice, particularly in the face of new domains, calling for approaches to enhance the generalization performance of PLMs during adaptation without requesting more labeled data. Early domain adaptation methods, which leverage similar source domains to boost model performance on the target domains, are developed based on customized models using traditional neural networks such as LSTMs. These models are shallow, require longer training time to converge, and have no prior knowledge compared to PLMs. Studies show that some popular domain adaptation methods can even harm the generalization performance of PLMs on the target domains. The unique characteristics of PLMs such as unprecedented scales, rich language priors, and many hitherto underexplored skills could be uncontrollable factors that make them exhibit different learning behaviors compared to traditional models. To this end, there is a need to develop algorithms for PLMs to enhance their domain adaptation performance, thereby accelerating their wide adoption in real-world scenarios. This thesis aims to explore techniques that can efficiently make use of the target domain labeled data and better adapt a given PLM to the target domains of interest by effectively transferring knowledge from similar source domains to the target domains. To achieve this goal, I conduct research from three perspectives throughout a machine learning pipeline, each assuming only specified locations can be updated with available computing resources. That is, we keep all other conditions fixed and only make updates to the input data, model representations, and output predictions respectively. We show how to achieve better generalization performance with limited labeled data from the target domains under each scenario. To sum up, we propose a new algorithm to generate adversarial perturbations using the domain adaptation objective to enhance the transferability of soft prompt tuning in low-resource scenarios, a new model optimization algorithm that takes into account the next-step gradients of adversarial domain discriminator when optimizing the task classifiers to accommodate competing losses and a new federated learning framework that calibrates the conditional probability distribution to adapt the same PLM to multiple domains under different label distributions. We present the specific problems, related works, detailed methods, extensive experiments, and thorough discussions in the following chapters, and shed light on how to base on traditional machine learning methods while catering to newly emerging learning paradigms. Doctor of Philosophy 2023-05-21T06:23:16Z 2023-05-21T06:23:16Z 2023 Thesis-Doctor of Philosophy Guo, X. (2023). Data-efficient domain adaptation for pretrained language models. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/167965 https://hdl.handle.net/10356/167965 10.32657/10356/167965 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University