Optimizing expectation with guarantees in POMDPs

A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications. Re...

Full description

Saved in:
Bibliographic Details
Main Authors: CHATTERJEE, Krishnendu, PEREZ, Guillermo A., RASKIN, Jean-François, ZIKELIC, Dorde
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9071
https://ink.library.smu.edu.sg/context/sis_research/article/10074/viewcontent/11046_13_14574_1_2_20201228.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10074
record_format dspace
spelling sg-smu-ink.sis_research-100742024-08-01T15:24:01Z Optimizing expectation with guarantees in POMDPs CHATTERJEE, Krishnendu PEREZ, Guillermo A. RASKIN, Jean-François ZIKELIC, Dorde A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications. Recently, there has been a surge of interest in POMDPs where the goal is to maximize the probability to ensure that the payoff is at least a given threshold, but these approaches do not consider any optimization beyond satisfying this threshold constraint. In this work we go beyond both the "expectation" and "threshold" approaches and consider a "guaranteed payoff optimization (GPO)" problem for POMDPs, where we are given a threshold t and the objective is to find a policy σ such that a) each possible outcome of σ yields a discounted-sum payoff of at least t, and b) the expected discounted-sum payoff of σ is optimal (or near-optimal) among all policies satisfying a). We present a practical approach to tackle the GPO problem and evaluate it on standard POMDP benchmarks. 2017-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9071 info:doi/10.5555/3298023.3298109 https://ink.library.smu.edu.sg/context/sis_research/article/10074/viewcontent/11046_13_14574_1_2_20201228.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Artificial Intelligence and Robotics
spellingShingle Artificial Intelligence and Robotics
CHATTERJEE, Krishnendu
PEREZ, Guillermo A.
RASKIN, Jean-François
ZIKELIC, Dorde
Optimizing expectation with guarantees in POMDPs
description A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications. Recently, there has been a surge of interest in POMDPs where the goal is to maximize the probability to ensure that the payoff is at least a given threshold, but these approaches do not consider any optimization beyond satisfying this threshold constraint. In this work we go beyond both the "expectation" and "threshold" approaches and consider a "guaranteed payoff optimization (GPO)" problem for POMDPs, where we are given a threshold t and the objective is to find a policy σ such that a) each possible outcome of σ yields a discounted-sum payoff of at least t, and b) the expected discounted-sum payoff of σ is optimal (or near-optimal) among all policies satisfying a). We present a practical approach to tackle the GPO problem and evaluate it on standard POMDP benchmarks.
format text
author CHATTERJEE, Krishnendu
PEREZ, Guillermo A.
RASKIN, Jean-François
ZIKELIC, Dorde
author_facet CHATTERJEE, Krishnendu
PEREZ, Guillermo A.
RASKIN, Jean-François
ZIKELIC, Dorde
author_sort CHATTERJEE, Krishnendu
title Optimizing expectation with guarantees in POMDPs
title_short Optimizing expectation with guarantees in POMDPs
title_full Optimizing expectation with guarantees in POMDPs
title_fullStr Optimizing expectation with guarantees in POMDPs
title_full_unstemmed Optimizing expectation with guarantees in POMDPs
title_sort optimizing expectation with guarantees in pomdps
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/9071
https://ink.library.smu.edu.sg/context/sis_research/article/10074/viewcontent/11046_13_14574_1_2_20201228.pdf
_version_ 1814047723571118080