Unsupervised training sequence design: Efficient and generalizable agent training

To train generalizable Reinforcement Learning (RL) agents, researchers recently proposed the Unsupervised Environment Design (UED) framework, in which a teacher agent creates a very large number of training environments and a student agent trains on the experiences in these environments to be robust...

Full description

Saved in:

Bibliographic Details
Main Authors:	LI, Wenjun, VARAKANTHAM, Pradeep
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/9362 https://ink.library.smu.edu.sg/context/sis_research/article/10362/viewcontent/29268_Article_Text_33322_1_2_20240324_pvoa.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10362
record_format	dspace
spelling	sg-smu-ink.sis_research-103622024-10-25T09:29:19Z Unsupervised training sequence design: Efficient and generalizable agent training LI, Wenjun VARAKANTHAM, Pradeep To train generalizable Reinforcement Learning (RL) agents, researchers recently proposed the Unsupervised Environment Design (UED) framework, in which a teacher agent creates a very large number of training environments and a student agent trains on the experiences in these environments to be robust against unseen testing scenarios. For example, to train a student to master the “stepping over stumps” task, the teacher will create numerous training environments with varying stump heights and shapes. In this paper, we argue that UED neglects training efficiency and its need for very large number of environments (henceforth referred to as infinite horizon training) makes it less suitable to training robots and non-expert humans. In real-world applications where either creating new training scenarios is expensive or training efficiency is of critical importance, we want to maximize both the learning efficiency and learning outcome of the student. To achieve efficient finite horizon training, we propose a novel Markov Decision Process (MDP) formulation for the teacher agent, referred to as Unsupervised Training Sequence Design (UTSD). Specifically, we encode salient information from the student policy (e.g., behaviors and learning progress) into the teacher's state space, enabling the teacher to closely track the student's learning progress and consequently discover the optimal training sequences with finite lengths. Additionally, we explore the teacher's efficient adaptation to unseen students at test time by employing the context-based meta-learning approach, which leverages the teacher's past experiences with various students. Finally, we empirically demonstrate our teacher's capability to design efficient and effective training sequences for students with varying capabilities. 2024-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9362 info:doi/10.1609/aaai.v38i12.29268 https://ink.library.smu.edu.sg/context/sis_research/article/10362/viewcontent/29268_Article_Text_33322_1_2_20240324_pvoa.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
spellingShingle	Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing LI, Wenjun VARAKANTHAM, Pradeep Unsupervised training sequence design: Efficient and generalizable agent training
description	To train generalizable Reinforcement Learning (RL) agents, researchers recently proposed the Unsupervised Environment Design (UED) framework, in which a teacher agent creates a very large number of training environments and a student agent trains on the experiences in these environments to be robust against unseen testing scenarios. For example, to train a student to master the “stepping over stumps” task, the teacher will create numerous training environments with varying stump heights and shapes. In this paper, we argue that UED neglects training efficiency and its need for very large number of environments (henceforth referred to as infinite horizon training) makes it less suitable to training robots and non-expert humans. In real-world applications where either creating new training scenarios is expensive or training efficiency is of critical importance, we want to maximize both the learning efficiency and learning outcome of the student. To achieve efficient finite horizon training, we propose a novel Markov Decision Process (MDP) formulation for the teacher agent, referred to as Unsupervised Training Sequence Design (UTSD). Specifically, we encode salient information from the student policy (e.g., behaviors and learning progress) into the teacher's state space, enabling the teacher to closely track the student's learning progress and consequently discover the optimal training sequences with finite lengths. Additionally, we explore the teacher's efficient adaptation to unseen students at test time by employing the context-based meta-learning approach, which leverages the teacher's past experiences with various students. Finally, we empirically demonstrate our teacher's capability to design efficient and effective training sequences for students with varying capabilities.
format	text
author	LI, Wenjun VARAKANTHAM, Pradeep
author_facet	LI, Wenjun VARAKANTHAM, Pradeep
author_sort	LI, Wenjun
title	Unsupervised training sequence design: Efficient and generalizable agent training
title_short	Unsupervised training sequence design: Efficient and generalizable agent training
title_full	Unsupervised training sequence design: Efficient and generalizable agent training
title_fullStr	Unsupervised training sequence design: Efficient and generalizable agent training
title_full_unstemmed	Unsupervised training sequence design: Efficient and generalizable agent training
title_sort	unsupervised training sequence design: efficient and generalizable agent training
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9362 https://ink.library.smu.edu.sg/context/sis_research/article/10362/viewcontent/29268_Article_Text_33322_1_2_20240324_pvoa.pdf
_version_	1814777825664696320

Unsupervised training sequence design: Efficient and generalizable agent training

Similar Items