Removing bias for out-of-distribution generalization

Deep models have a strong ability to fit the training data, and thus can achieve high performance when the testing data is sampled from the same distribution as the training. However, in practice, the deep models fail to perform perfectly because the testing data is usually Out-of-Distribution (OOD)...

Full description

Saved in:
Bibliographic Details
Main Author: Qi, Jiaxin
Other Authors: Zhang Hanwang
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/168654
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Deep models have a strong ability to fit the training data, and thus can achieve high performance when the testing data is sampled from the same distribution as the training. However, in practice, the deep models fail to perform perfectly because the testing data is usually Out-of-Distribution (OOD) compared to the training, which is known as the OOD Generalization problem. The underlying reason is that, in the training, besides the causal effect, i.e., the causalities between inputs and outputs which describe the data generation process and will not change under any data distribution, the models also learn the bias, i.e., the spurious correlations between inputs and outputs which only exists in the training distribution, and thus learning such bias will make the model fail to generalize to OOD data. To help the models achieve better OOD Generalization performance, we need to pursue the causal effect by removing the learned bias. However, due to the various data organization formats and different given inputs, it is hard to propose a uniform bias removal strategy, and thus we categorize the OOD Generalization tasks into three camps and conduct specific case studies for each one: 1) OOD Generalization with Multiple Modalities, where multiple modalities, such as language and image, are provided in the training, and we focus on a specific case, Visual Dialog, to analyze its underlying causal relationships between the modalities and propose two causal principles to remove the history bias and user bias for better OOD performance. 2) OOD Generalization with Multiple Domains, where there is only one modality, images, but multiple training domains and their domain annotations are given. We focus on Domain Generalization (DG) and propose to create a new domain by cross-domain influence to remove the ``spurious invariance'' bias to help current DG methods achieve better OOD performance. 3) OOD Generalization with no Additional Annotations, where only one modality, images, and one training domain with no additional annotations, such as domain annotations or bias annotations, are given in the training. We focus on a specific case, Debiasing, and propose two algorithms for removing bias. First, we design a two-stage pipeline with re-weighting methods to effectively remove the underlying context bias. Second, due to the context estimation method used by current re-weighting is hard to succeed when class effect and context effect are entangled, we propose Invariant Risk Minimization for Context to disentangle the context to achieve better re-weighting for removing context bias to achieve better OOD Generalization for debiasing.