Feature analysis of marginalized stacked denoising autoenconder for unsupervised domain adaptation
Marginalized stacked denoising autoencoder (mSDA), has recently emerged with demonstrated effectiveness in domain adaptation. In this paper, we investigate the rationale for why mSDA benefits domain adaptation tasks from the perspective of adaptive regularization. Our investigations focus on two typ...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/151969 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Marginalized stacked denoising autoencoder (mSDA), has recently emerged with demonstrated effectiveness in domain adaptation. In this paper, we investigate the rationale for why mSDA benefits domain adaptation tasks from the perspective of adaptive regularization. Our investigations focus on two types of feature corruption noise: Gaussian noise (mSDA g ) and Bernoulli dropout noise (mSDA bd ). Both theoretical and empirical results demonstrate that mSDA bd successfully boosts the adaptation performance but mSDA g fails to do so. We then propose a new mSDA with data-dependent multinomial dropout noise (mSDA md ) that overcomes the limitations of mSDA bd and further improves the adaptation performance. Our mSDA md is based on a more realistic assumption: different features are correlated and, thus, should be corrupted with different probabilities. Experimental results demonstrate the superiority of mSDA md to mSDA bd on the adaptation performance and the convergence speed. Finally, we propose a deep transferable feature coding (DTFC) framework for unsupervised domain adaptation. The motivation of DTFC is that mSDA fails to consider the distribution discrepancy across different domains in the feature learning process. We introduce a new element to mSDA: domain divergence minimization by maximum mean discrepancy. This element is essential for domain adaptation as it ensures the extracted deep features to have a small distribution discrepancy. The effectiveness of DTFC is verified by extensive experiments on three benchmark data sets for both Bernoulli dropout noise and multinomial dropout noise. |
---|