Towards advanced machine generalization via data interplay
Traditional machine learning methodologies presuppose that training and testing datasets are Independent and Identically Distributed (IID), i.e., assuming the samples of both training and testing are drawn from a consistent distribution. However, this IID assumption often fails to hold in real-world...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/180357 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Traditional machine learning methodologies presuppose that training and testing datasets are Independent and Identically Distributed (IID), i.e., assuming the samples of both training and testing are drawn from a consistent distribution. However, this IID assumption often fails to hold in real-world applications, leading to substantial performance drops. This issue is known as the Out-of-Distribution (OOD) generalization problem, characterized by the test data differing significantly from the training data in unforeseen ways. Additionally, suboptimal design choices at any stage of developing a deep learning model can further impair its generalization capability across a broad spectrum of tasks. In this thesis, we study the OOD generalization problem in the context of four specific application areas, ranging from conventional computer vision tasks to the emerging field of diffusion generative models. To tackle the issue of robustness, we introduce the key innovation termed “data interplay”, aimed at more efficient
utilization of training data. Specifically, we categorize data interplay into three distinct types, focusing on different levels of data interaction: interplay between data points, data groups, and data modalities. |
---|