Removing bias for out-of-distribution generalization

Deep models have a strong ability to fit the training data, and thus can achieve high performance when the testing data is sampled from the same distribution as the training. However, in practice, the deep models fail to perform perfectly because the testing data is usually Out-of-Distribution (OOD)...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Qi, Jiaxin
مؤلفون آخرون: Zhang Hanwang
التنسيق: Thesis-Doctor of Philosophy
اللغة:English
منشور في: Nanyang Technological University 2023
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/168654
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Nanyang Technological University
اللغة: English
الوصف
الملخص:Deep models have a strong ability to fit the training data, and thus can achieve high performance when the testing data is sampled from the same distribution as the training. However, in practice, the deep models fail to perform perfectly because the testing data is usually Out-of-Distribution (OOD) compared to the training, which is known as the OOD Generalization problem. The underlying reason is that, in the training, besides the causal effect, i.e., the causalities between inputs and outputs which describe the data generation process and will not change under any data distribution, the models also learn the bias, i.e., the spurious correlations between inputs and outputs which only exists in the training distribution, and thus learning such bias will make the model fail to generalize to OOD data. To help the models achieve better OOD Generalization performance, we need to pursue the causal effect by removing the learned bias. However, due to the various data organization formats and different given inputs, it is hard to propose a uniform bias removal strategy, and thus we categorize the OOD Generalization tasks into three camps and conduct specific case studies for each one: 1) OOD Generalization with Multiple Modalities, where multiple modalities, such as language and image, are provided in the training, and we focus on a specific case, Visual Dialog, to analyze its underlying causal relationships between the modalities and propose two causal principles to remove the history bias and user bias for better OOD performance. 2) OOD Generalization with Multiple Domains, where there is only one modality, images, but multiple training domains and their domain annotations are given. We focus on Domain Generalization (DG) and propose to create a new domain by cross-domain influence to remove the ``spurious invariance'' bias to help current DG methods achieve better OOD performance. 3) OOD Generalization with no Additional Annotations, where only one modality, images, and one training domain with no additional annotations, such as domain annotations or bias annotations, are given in the training. We focus on a specific case, Debiasing, and propose two algorithms for removing bias. First, we design a two-stage pipeline with re-weighting methods to effectively remove the underlying context bias. Second, due to the context estimation method used by current re-weighting is hard to succeed when class effect and context effect are entangled, we propose Invariant Risk Minimization for Context to disentangle the context to achieve better re-weighting for removing context bias to achieve better OOD Generalization for debiasing.