Learning control policies for stochastic systems with reach-avoid guarantees

We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable p...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZIKELIC, Dorde, LECHNER, Mathias, HENZINGER, A. Thomas, CHATTERJEE, Krishnendu
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Programming Languages and Compilers
Online Access:	https://ink.library.smu.edu.sg/sis_research/9081 https://ink.library.smu.edu.sg/context/sis_research/article/10084/viewcontent/26407_Article_Text_30470_1_2_20230626.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Description
Summary:	We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold p ∈ [0,1] over the infinite time horizon in general Lipschitz continuous systems. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems - it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on 3 stochastic non-linear reinforcement learning tasks.

Learning control policies for stochastic systems with reach-avoid guarantees

Similar Items