Trust-region inverse reinforcement learning

This paper proposes a new unified inverse reinforcement learning (IRL) framework based on trust-region methods and a recently proposed Pontryagin differential programming (PDP) method in Jin et al. (2020), which aims to learn the parameters in both the system model and the cost function for three ty...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Cao, Kun, Xie, Lihua
مؤلفون آخرون:	School of Electrical and Electronic Engineering
التنسيق:	مقال
اللغة:	English
منشور في:	2023
الموضوعات:	Engineering::Electrical and electronic engineering Trust Region Methods PMP
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/170705
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-170705
record_format	dspace
spelling	sg-ntu-dr.10356-1707052023-09-26T05:54:28Z Trust-region inverse reinforcement learning Cao, Kun Xie, Lihua School of Electrical and Electronic Engineering Engineering::Electrical and electronic engineering Trust Region Methods PMP This paper proposes a new unified inverse reinforcement learning (IRL) framework based on trust-region methods and a recently proposed Pontryagin differential programming (PDP) method in Jin et al. (2020), which aims to learn the parameters in both the system model and the cost function for three types of problems, namely, N-player nonzero-sum multistage games, 2-player zero-sum multistage games and 1-player optimal control, from demonstrated trajectories. Different from the existing frameworks using gradient to update learning parameters, our framework updates them with the candidate solution of trust-region subproblem (TRS), where its required gradient and Hessian are obtained by differentiating Pontryagin's Maximum Principle (PMP) equations once and twice, respectively. The differentiated equations are shown to be equivalent to the PMP equations for affine-quadratic games / optimal control problems and can be solved by some explicit recursions. Extensive simulation examples and comparisons are presented to demonstrate the effectiveness of our proposed algorithm. Nanyang Technological University This work was supported by the Wallenberg-NTU Presidential Postdoctoral Fellowship in Nanyang Technological University, Singapore. 2023-09-26T05:54:28Z 2023-09-26T05:54:28Z 2023 Journal Article Cao, K. & Xie, L. (2023). Trust-region inverse reinforcement learning. IEEE Transactions On Automatic Control, 1-8. https://dx.doi.org/10.1109/TAC.2023.3274629 0018-9286 https://hdl.handle.net/10356/170705 10.1109/TAC.2023.3274629 2-s2.0-85159816020 1 8 en IEEE Transactions on Automatic Control © 2023 IEEE. All right reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering Trust Region Methods PMP
spellingShingle	Engineering::Electrical and electronic engineering Trust Region Methods PMP Cao, Kun Xie, Lihua Trust-region inverse reinforcement learning
description	This paper proposes a new unified inverse reinforcement learning (IRL) framework based on trust-region methods and a recently proposed Pontryagin differential programming (PDP) method in Jin et al. (2020), which aims to learn the parameters in both the system model and the cost function for three types of problems, namely, N-player nonzero-sum multistage games, 2-player zero-sum multistage games and 1-player optimal control, from demonstrated trajectories. Different from the existing frameworks using gradient to update learning parameters, our framework updates them with the candidate solution of trust-region subproblem (TRS), where its required gradient and Hessian are obtained by differentiating Pontryagin's Maximum Principle (PMP) equations once and twice, respectively. The differentiated equations are shown to be equivalent to the PMP equations for affine-quadratic games / optimal control problems and can be solved by some explicit recursions. Extensive simulation examples and comparisons are presented to demonstrate the effectiveness of our proposed algorithm.
author2	School of Electrical and Electronic Engineering
author_facet	School of Electrical and Electronic Engineering Cao, Kun Xie, Lihua
format	Article
author	Cao, Kun Xie, Lihua
author_sort	Cao, Kun
title	Trust-region inverse reinforcement learning
title_short	Trust-region inverse reinforcement learning
title_full	Trust-region inverse reinforcement learning
title_fullStr	Trust-region inverse reinforcement learning
title_full_unstemmed	Trust-region inverse reinforcement learning
title_sort	trust-region inverse reinforcement learning
publishDate	2023
url	https://hdl.handle.net/10356/170705
_version_	1779156524703154176

Trust-region inverse reinforcement learning

مواد مشابهة