Learning control policies for stochastic systems with reach-avoid guarantees

We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable p...

Full description

Saved in:
Bibliographic Details
Main Authors: ZIKELIC, Dorde, LECHNER, Mathias, HENZINGER, A. Thomas, CHATTERJEE, Krishnendu
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9081
https://ink.library.smu.edu.sg/context/sis_research/article/10084/viewcontent/26407_Article_Text_30470_1_2_20230626.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10084
record_format dspace
spelling sg-smu-ink.sis_research-100842024-08-01T15:17:45Z Learning control policies for stochastic systems with reach-avoid guarantees ZIKELIC, Dorde LECHNER, Mathias HENZINGER, A. Thomas CHATTERJEE, Krishnendu We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold p ∈ [0,1] over the infinite time horizon in general Lipschitz continuous systems. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems - it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on 3 stochastic non-linear reinforcement learning tasks. 2023-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9081 info:doi/10.1609/aaai.v37i10.26407 https://ink.library.smu.edu.sg/context/sis_research/article/10084/viewcontent/26407_Article_Text_30470_1_2_20230626.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Programming Languages and Compilers
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Programming Languages and Compilers
spellingShingle Programming Languages and Compilers
ZIKELIC, Dorde
LECHNER, Mathias
HENZINGER, A. Thomas
CHATTERJEE, Krishnendu
Learning control policies for stochastic systems with reach-avoid guarantees
description We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold p ∈ [0,1] over the infinite time horizon in general Lipschitz continuous systems. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems - it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on 3 stochastic non-linear reinforcement learning tasks.
format text
author ZIKELIC, Dorde
LECHNER, Mathias
HENZINGER, A. Thomas
CHATTERJEE, Krishnendu
author_facet ZIKELIC, Dorde
LECHNER, Mathias
HENZINGER, A. Thomas
CHATTERJEE, Krishnendu
author_sort ZIKELIC, Dorde
title Learning control policies for stochastic systems with reach-avoid guarantees
title_short Learning control policies for stochastic systems with reach-avoid guarantees
title_full Learning control policies for stochastic systems with reach-avoid guarantees
title_fullStr Learning control policies for stochastic systems with reach-avoid guarantees
title_full_unstemmed Learning control policies for stochastic systems with reach-avoid guarantees
title_sort learning control policies for stochastic systems with reach-avoid guarantees
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/9081
https://ink.library.smu.edu.sg/context/sis_research/article/10084/viewcontent/26407_Article_Text_30470_1_2_20230626.pdf
_version_ 1814047726191509504