Learning to generalize to new tasks/domains with limited data

The goal of Artificial Intelligence (AI) research is to develop a system that not only performs tasks comparably to humans (e.g., understanding language and vision) but also learns new tasks similarly to humans. While the former has been well-achieved with the recent advances of large AI models, the...

Full description

Saved in:
Bibliographic Details
Main Author: Peng, Danni
Other Authors: Sinno Jialin Pan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171769
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The goal of Artificial Intelligence (AI) research is to develop a system that not only performs tasks comparably to humans (e.g., understanding language and vision) but also learns new tasks similarly to humans. While the former has been well-achieved with the recent advances of large AI models, the latter still remains a challenge for AI researchers. The key trait of human beings is the ability to generalize knowledge from past experience, achieving satisfactory performance on new tasks with minimal to no learning (i.e., limited data). In this thesis, I focus on three areas of study with different problem settings, proposing learning-based algorithms to improve the system's generalization ability to unseen tasks or domains when data is limited. My first focus is on meta-learning, which aims to acquire common knowledge from a set of tasks to facilitate learning of new tasks with few examples. A popular approach is to capture the common knowledge in the form of a global initialization. To further handle the case where tasks are from different distributions, recent developments propose to condition the global initialization with some task-specific feature-based representations. My work further considers utilizing the task optimization process to derive the task representation, which contains additional information about the quality of the global initialization to inidividual task learning. As a result, task representation learned in this way can be better used to adapt the global initialization to a more beneficial one. The method is evaluated in two real-world application domains: few-shot image classification and user cold-start recommendations, to demonstrate its effectiveness. Meta-learning derives common knowledge by assuming that all tasks are available at once. However, in real-world situations, tasks are often learned in a sequential manner, where access to previous tasks' data is restricted. This setting falls under another field of study termed continual learning (CL). Unlike most CL works that focus on preventing forgetting of previous tasks, my research focuses more on how to leverage knowledge gained from previous tasks to enhance performance on the current task - an ability known as forward transfer, which is of great significance especially when the data from a single task is limited. With specific application to incremental updates of recommender systems, where data arriving at different periods can be treated as different tasks, and incrementally updating the recommender system can be seen as a form of CL, it is possible to mine the historical trends from the past tasks to achieve forward transfer, i.e., to improve performance on the current task/period. In this regard, I develop a model generation method that learns to leverage models from the past periods to generate a more superior model for the current period. A recurrent neural network design is adopted for the meta-model generator to leverage its ability of capturing the sequential patterns. My third research focuses on generalizing across domains. Different from tasks, domains are specifically defined by variations in input distribution. When the target domain is completely inaccessible during training, and the model learns to generalize from multiple source domains, the problem is termed domain generalization (DG). A common pitfall of the existing DG methods is the risk of overfitting to the source domains. To address this issue, some works propose to use interpolation between source data to cover the unseen region within the convex hull of the source domains. My research further explores the potential of extrapolation to extend beyond the convex hull and achieve better generalization. To avoid the adverse effects caused by uncontrolled extrapolation, a strategy is carefully designed to generate the sample weights based on gradients and learned towards flatter minima. Experiments demonstrate that the proposed strategy can better cover the regions outside the source domains, and the resultant loss minima are flatter and wider, better aligning with the target optima.