Domain adaptation : methods and applications

Conventional machine learning needs sufficient labeled data to achieve satisfactory generalization performance. Nonetheless, the acquiring of labeled data is expensive and time-consuming. Domain adaptation provides an efficient way to manage the data label scarcity. The objective of domain adaptatio...

Full description

Saved in:
Bibliographic Details
Main Author: Wei, Pengfei
Other Authors: Ke Yi Ping, Kelly
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/82943
http://hdl.handle.net/10220/47539
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-82943
record_format dspace
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Wei, Pengfei
Domain adaptation : methods and applications
description Conventional machine learning needs sufficient labeled data to achieve satisfactory generalization performance. Nonetheless, the acquiring of labeled data is expensive and time-consuming. Domain adaptation provides an efficient way to manage the data label scarcity. The objective of domain adaptation is to learn a model that generalizes good performance in a target domain where none or scarce labeled data is available, by leveraging upon the knowledge transferred from a different but related source domain with plenty of labeled data. Depending on the homogeneity of the source and target feature spaces, domain adaptation is divided into two classes: homogeneous one and heterogeneous one. In homogeneous domain adaptation, the crucial point is to align the source and target feature distributions, while in heterogeneous case, eliminating the heterogeneity of feature spaces should be done before the distributions match. In this thesis, we develop new algorithms for homogeneous and heterogeneous domain adaptation problems, respectively. For the homogeneous case, we propose novel frameworks that aim to improve the adaptation performance by solving limitations of a state-of-the-art domain adaptation method. Regarding the heterogeneous case, we are dedicated to eliminating the heterogeneity of feature spaces without using any supplementary information that is indispensable in the existing heterogeneous adaptation studies. Except for these two conventional domain adaptation problem settings, we are also dedicated to a more general and challenging problem, which aims at fusing knowledge from multiple source domains for regression tasks. Such a problem setting is popular in various real-world applications, but has been less studied in the literatures. Specifically, three novel works are conducted. First of all, we develop methods, specifically, one deep feature learning method and one subspace-based method, for homogeneous problem settings. Regarding the deep feature learning method, we conduct investigation and development based on marginalized stacked denoising autoencoder (mSDA). We first investigate the rationale for why mSDA can benefit adaptation tasks from the perspective of adaptive regularization. Then, we propose a new mSDA with data-dependent Multinomial dropout} noise (mSDAmd), which overcomes limitations of the conventional mSDA and further improves the adaptation performance. Finally, we develop a deep nonlinear feature coding (DNFC) framework that introduces two new elements to mSDA: domain divergence minimization by Maximum Mean Discrepancy, and nonlinear coding by kernelization. Generalization of both the conventional mSDA and our proposed mSDAmd to DNFC framework is also investigated. Regarding the subspace-based method, we introduce a multiple manifolds assumption to domain adaptation, and develop a local manifolds information transfer framework under such an assumption. Specifically, we first propose a manifold neighborhood preservation embedding algorithm to preserve the neighborhood structure of each low-dimensional manifold in subspace learning. We then incorporate it with the global distribution discrepancy minimization into one unified framework. Next, we propose a new problem setting which is a special case of heterogeneous domain adaptation, called hybrid domain adaptation, so that it can be solved efficiently without the help of any supplementary information. We propose a general domain specific feature transfer (DSFT) framework, which can link up two domains using common features and simultaneously reduce domain divergences. Specifically, we learn the translations between common features and domain specific features, and cross-use the learned translations to transfer domain specific features of one domain to another domain. A homogeneous space, where the domain divergences are minimized, is then composed. A linear instantiation of the DSFT framework and also a nonlinear one are presented, respectively. Extensive experiments verify the effectiveness of our proposed DSFT. Finally, we focus on a more general problem, which aims to transfer knowledge across multiple source domains. Different from the conventional domain adaptation that assumes the source and target tasks are the same, we focus on a more challenging case where the tasks of different domains are also heterogeneous. Specifically, we are dedicated to modeling diverse task similarities through a transfer covariance function. We first investigate the feasibility and performance of a family of transfer covariance functions that represent the pairwise similarity of each source and the target domain. We theoretically prove that using such a transfer covariance function for general Gaussian process modelling can only capture the same similarity coefficient for all the sources, and thus may result in unsatisfactory adaptation performance. This leads us to propose TCMSStack, an integrated strategy incorporating the benefits of the transfer covariance function and stacking. Experimental studies on both synthetic and real-world datasets verify TCMSStack.
author2 Ke Yi Ping, Kelly
author_facet Ke Yi Ping, Kelly
Wei, Pengfei
format Theses and Dissertations
author Wei, Pengfei
author_sort Wei, Pengfei
title Domain adaptation : methods and applications
title_short Domain adaptation : methods and applications
title_full Domain adaptation : methods and applications
title_fullStr Domain adaptation : methods and applications
title_full_unstemmed Domain adaptation : methods and applications
title_sort domain adaptation : methods and applications
publishDate 2019
url https://hdl.handle.net/10356/82943
http://hdl.handle.net/10220/47539
_version_ 1681056156221440000
spelling sg-ntu-dr.10356-829432020-06-24T03:10:23Z Domain adaptation : methods and applications Wei, Pengfei Ke Yi Ping, Kelly School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Conventional machine learning needs sufficient labeled data to achieve satisfactory generalization performance. Nonetheless, the acquiring of labeled data is expensive and time-consuming. Domain adaptation provides an efficient way to manage the data label scarcity. The objective of domain adaptation is to learn a model that generalizes good performance in a target domain where none or scarce labeled data is available, by leveraging upon the knowledge transferred from a different but related source domain with plenty of labeled data. Depending on the homogeneity of the source and target feature spaces, domain adaptation is divided into two classes: homogeneous one and heterogeneous one. In homogeneous domain adaptation, the crucial point is to align the source and target feature distributions, while in heterogeneous case, eliminating the heterogeneity of feature spaces should be done before the distributions match. In this thesis, we develop new algorithms for homogeneous and heterogeneous domain adaptation problems, respectively. For the homogeneous case, we propose novel frameworks that aim to improve the adaptation performance by solving limitations of a state-of-the-art domain adaptation method. Regarding the heterogeneous case, we are dedicated to eliminating the heterogeneity of feature spaces without using any supplementary information that is indispensable in the existing heterogeneous adaptation studies. Except for these two conventional domain adaptation problem settings, we are also dedicated to a more general and challenging problem, which aims at fusing knowledge from multiple source domains for regression tasks. Such a problem setting is popular in various real-world applications, but has been less studied in the literatures. Specifically, three novel works are conducted. First of all, we develop methods, specifically, one deep feature learning method and one subspace-based method, for homogeneous problem settings. Regarding the deep feature learning method, we conduct investigation and development based on marginalized stacked denoising autoencoder (mSDA). We first investigate the rationale for why mSDA can benefit adaptation tasks from the perspective of adaptive regularization. Then, we propose a new mSDA with data-dependent Multinomial dropout} noise (mSDAmd), which overcomes limitations of the conventional mSDA and further improves the adaptation performance. Finally, we develop a deep nonlinear feature coding (DNFC) framework that introduces two new elements to mSDA: domain divergence minimization by Maximum Mean Discrepancy, and nonlinear coding by kernelization. Generalization of both the conventional mSDA and our proposed mSDAmd to DNFC framework is also investigated. Regarding the subspace-based method, we introduce a multiple manifolds assumption to domain adaptation, and develop a local manifolds information transfer framework under such an assumption. Specifically, we first propose a manifold neighborhood preservation embedding algorithm to preserve the neighborhood structure of each low-dimensional manifold in subspace learning. We then incorporate it with the global distribution discrepancy minimization into one unified framework. Next, we propose a new problem setting which is a special case of heterogeneous domain adaptation, called hybrid domain adaptation, so that it can be solved efficiently without the help of any supplementary information. We propose a general domain specific feature transfer (DSFT) framework, which can link up two domains using common features and simultaneously reduce domain divergences. Specifically, we learn the translations between common features and domain specific features, and cross-use the learned translations to transfer domain specific features of one domain to another domain. A homogeneous space, where the domain divergences are minimized, is then composed. A linear instantiation of the DSFT framework and also a nonlinear one are presented, respectively. Extensive experiments verify the effectiveness of our proposed DSFT. Finally, we focus on a more general problem, which aims to transfer knowledge across multiple source domains. Different from the conventional domain adaptation that assumes the source and target tasks are the same, we focus on a more challenging case where the tasks of different domains are also heterogeneous. Specifically, we are dedicated to modeling diverse task similarities through a transfer covariance function. We first investigate the feasibility and performance of a family of transfer covariance functions that represent the pairwise similarity of each source and the target domain. We theoretically prove that using such a transfer covariance function for general Gaussian process modelling can only capture the same similarity coefficient for all the sources, and thus may result in unsatisfactory adaptation performance. This leads us to propose TCMSStack, an integrated strategy incorporating the benefits of the transfer covariance function and stacking. Experimental studies on both synthetic and real-world datasets verify TCMSStack. Doctor of Philosophy 2019-01-22T12:44:56Z 2019-12-06T15:08:46Z 2019-01-22T12:44:56Z 2019-12-06T15:08:46Z 2018 Thesis Wei, P. (2018). Domain adaptation : methods and applications. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/82943 http://hdl.handle.net/10220/47539 10.32657/10220/47539 en 164 p. application/pdf