Imitating cost-constrained behaviors in reinforcement learning

Complex planning and scheduling problems have long been solved using various optimization or heuristic approaches. In recent years, imitation learning that aims to learn from expert demonstrations has been proposed as a viable alternative to solving these problems. Generally speaking, imitation lear...

Full description

Saved in:

Bibliographic Details
Main Authors:	SHAO, Qian, VARAKANTHAM, Pradeep, CHENG, Shih-Fen
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Artificial Intelligence and Robotics Operations Research, Systems Engineering and Industrial Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/9496 https://ink.library.smu.edu.sg/context/sis_research/article/10496/viewcontent/31512_Article_Text_35569_1_2_20240530_pvoa.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10496
record_format	dspace
spelling	sg-smu-ink.sis_research-104962024-11-11T06:02:52Z Imitating cost-constrained behaviors in reinforcement learning SHAO, Qian VARAKANTHAM, Pradeep CHENG, Shih-Fen Complex planning and scheduling problems have long been solved using various optimization or heuristic approaches. In recent years, imitation learning that aims to learn from expert demonstrations has been proposed as a viable alternative to solving these problems. Generally speaking, imitation learning is designed to learn either the reward (or preference) model or directly the behavioral policy by observing the behavior of an expert. Existing work in imitation learning and inverse reinforcement learning has focused on imitation primarily in unconstrained settings (e.g., no limit on fuel consumed by the vehicle). However, in many real-world domains, the behavior of an expert is governed not only by reward (or preference) but also by constraints. For instance, decisions on self-driving delivery vehicles are dependent not only on the route preferences/rewards (depending on past demand data) but also on the fuel in the vehicle and the time available. In such problems, imitation learning is challenging as decisions are not only dictated by the reward model but are also dependent on a cost-constrained model. In this paper, we provide multiple methods that match expert distributions in the presence of trajectory cost constraints through (a) Lagrangian-based method; (b) Meta-gradients to find a good trade-off between expected return and minimizing constraint violation; and (c) Cost-violation-based alternating gradient. We empirically show that leading imitation learning approaches imitate cost-constrained behaviors poorly and our meta-gradient-based approach achieves the best performance. 2024-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9496 info:doi/10.1609/icaps.v34i1.31512 https://ink.library.smu.edu.sg/context/sis_research/article/10496/viewcontent/31512_Article_Text_35569_1_2_20240530_pvoa.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics Operations Research, Systems Engineering and Industrial Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Artificial Intelligence and Robotics Operations Research, Systems Engineering and Industrial Engineering
spellingShingle	Artificial Intelligence and Robotics Operations Research, Systems Engineering and Industrial Engineering SHAO, Qian VARAKANTHAM, Pradeep CHENG, Shih-Fen Imitating cost-constrained behaviors in reinforcement learning
description	Complex planning and scheduling problems have long been solved using various optimization or heuristic approaches. In recent years, imitation learning that aims to learn from expert demonstrations has been proposed as a viable alternative to solving these problems. Generally speaking, imitation learning is designed to learn either the reward (or preference) model or directly the behavioral policy by observing the behavior of an expert. Existing work in imitation learning and inverse reinforcement learning has focused on imitation primarily in unconstrained settings (e.g., no limit on fuel consumed by the vehicle). However, in many real-world domains, the behavior of an expert is governed not only by reward (or preference) but also by constraints. For instance, decisions on self-driving delivery vehicles are dependent not only on the route preferences/rewards (depending on past demand data) but also on the fuel in the vehicle and the time available. In such problems, imitation learning is challenging as decisions are not only dictated by the reward model but are also dependent on a cost-constrained model. In this paper, we provide multiple methods that match expert distributions in the presence of trajectory cost constraints through (a) Lagrangian-based method; (b) Meta-gradients to find a good trade-off between expected return and minimizing constraint violation; and (c) Cost-violation-based alternating gradient. We empirically show that leading imitation learning approaches imitate cost-constrained behaviors poorly and our meta-gradient-based approach achieves the best performance.
format	text
author	SHAO, Qian VARAKANTHAM, Pradeep CHENG, Shih-Fen
author_facet	SHAO, Qian VARAKANTHAM, Pradeep CHENG, Shih-Fen
author_sort	SHAO, Qian
title	Imitating cost-constrained behaviors in reinforcement learning
title_short	Imitating cost-constrained behaviors in reinforcement learning
title_full	Imitating cost-constrained behaviors in reinforcement learning
title_fullStr	Imitating cost-constrained behaviors in reinforcement learning
title_full_unstemmed	Imitating cost-constrained behaviors in reinforcement learning
title_sort	imitating cost-constrained behaviors in reinforcement learning
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9496 https://ink.library.smu.edu.sg/context/sis_research/article/10496/viewcontent/31512_Article_Text_35569_1_2_20240530_pvoa.pdf
_version_	1816859095920017408

Imitating cost-constrained behaviors in reinforcement learning

Similar Items