SPRINQL : Sub-optimal demonstrations driven offline imitation learning

We focus on offline imitation learning (IL), which aims to mimic an expert's behavior using demonstrations without any interaction with the environment. One of the main challenges in offline IL is the limited support of expert demonstrations, which typically cover only a small fraction of the s...

Full description

Saved in:

Bibliographic Details
Main Authors:	HOANG, Minh Huy, MAI, Tien, VARAKANTHAM, Pradeep
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Sup-optimal demonstrations Imitation learning Inverse Q learning Reinforcement learning Artificial Intelligence and Robotics
Online Access:	https://ink.library.smu.edu.sg/sis_research/9821 https://ink.library.smu.edu.sg/context/sis_research/article/10821/viewcontent/Neurips_2024___SPRINQL.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10821
record_format	dspace
spelling	sg-smu-ink.sis_research-108212024-12-24T03:40:32Z SPRINQL : Sub-optimal demonstrations driven offline imitation learning HOANG, Minh Huy MAI, Tien VARAKANTHAM, Pradeep We focus on offline imitation learning (IL), which aims to mimic an expert's behavior using demonstrations without any interaction with the environment. One of the main challenges in offline IL is the limited support of expert demonstrations, which typically cover only a small fraction of the state-action space. While it may not be feasible to obtain numerous expert demonstrations, it is often possible to gather a larger set of sub-optimal demonstrations. For example, in treatment optimization problems, there are varying levels of doctor treatments available for different chronic conditions. These range from treatment specialists and experienced general practitioners to less experienced general practitioners. Similarly, when robots are trained to imitate humans in routine tasks, they might learn from individuals with different levels of expertise and efficiency. In this paper, we propose an offline IL approach that leverages the larger set of sub-optimal demonstrations while effectively mimicking expert trajectories. Existing offline IL methods based on behavior cloning or distribution matching often face issues such as overfitting to the limited set of expert demonstrations or inadvertently imitating sub-optimal trajectories from the larger dataset. Our approach, which is based on inverse soft-Q learning, learns from both expert and sub-optimal demonstrations. It assigns higher importance (through learned weights) to aligning with expert demonstrations and lower importance to aligning with sub-optimal ones. A key contribution of our approach, called SPRINQL, is transforming the offline IL problem into a convex optimization over the space of Q functions. Through comprehensive experimental evaluations, we demonstrate that the SPRINQL algorithm achieves state-of-the-art (SOTA) performance on offline IL benchmarks 2024-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9821 https://ink.library.smu.edu.sg/context/sis_research/article/10821/viewcontent/Neurips_2024___SPRINQL.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Sup-optimal demonstrations Imitation learning Inverse Q learning Reinforcement learning Artificial Intelligence and Robotics
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Sup-optimal demonstrations Imitation learning Inverse Q learning Reinforcement learning Artificial Intelligence and Robotics
spellingShingle	Sup-optimal demonstrations Imitation learning Inverse Q learning Reinforcement learning Artificial Intelligence and Robotics HOANG, Minh Huy MAI, Tien VARAKANTHAM, Pradeep SPRINQL : Sub-optimal demonstrations driven offline imitation learning
description	We focus on offline imitation learning (IL), which aims to mimic an expert's behavior using demonstrations without any interaction with the environment. One of the main challenges in offline IL is the limited support of expert demonstrations, which typically cover only a small fraction of the state-action space. While it may not be feasible to obtain numerous expert demonstrations, it is often possible to gather a larger set of sub-optimal demonstrations. For example, in treatment optimization problems, there are varying levels of doctor treatments available for different chronic conditions. These range from treatment specialists and experienced general practitioners to less experienced general practitioners. Similarly, when robots are trained to imitate humans in routine tasks, they might learn from individuals with different levels of expertise and efficiency. In this paper, we propose an offline IL approach that leverages the larger set of sub-optimal demonstrations while effectively mimicking expert trajectories. Existing offline IL methods based on behavior cloning or distribution matching often face issues such as overfitting to the limited set of expert demonstrations or inadvertently imitating sub-optimal trajectories from the larger dataset. Our approach, which is based on inverse soft-Q learning, learns from both expert and sub-optimal demonstrations. It assigns higher importance (through learned weights) to aligning with expert demonstrations and lower importance to aligning with sub-optimal ones. A key contribution of our approach, called SPRINQL, is transforming the offline IL problem into a convex optimization over the space of Q functions. Through comprehensive experimental evaluations, we demonstrate that the SPRINQL algorithm achieves state-of-the-art (SOTA) performance on offline IL benchmarks
format	text
author	HOANG, Minh Huy MAI, Tien VARAKANTHAM, Pradeep
author_facet	HOANG, Minh Huy MAI, Tien VARAKANTHAM, Pradeep
author_sort	HOANG, Minh Huy
title	SPRINQL : Sub-optimal demonstrations driven offline imitation learning
title_short	SPRINQL : Sub-optimal demonstrations driven offline imitation learning
title_full	SPRINQL : Sub-optimal demonstrations driven offline imitation learning
title_fullStr	SPRINQL : Sub-optimal demonstrations driven offline imitation learning
title_full_unstemmed	SPRINQL : Sub-optimal demonstrations driven offline imitation learning
title_sort	sprinql : sub-optimal demonstrations driven offline imitation learning
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9821 https://ink.library.smu.edu.sg/context/sis_research/article/10821/viewcontent/Neurips_2024___SPRINQL.pdf
_version_	1820027791355150336

SPRINQL : Sub-optimal demonstrations driven offline imitation learning

Similar Items