Optimizing expectation with guarantees in POMDPs
A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications. Re...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2017
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9071 https://ink.library.smu.edu.sg/context/sis_research/article/10074/viewcontent/11046_13_14574_1_2_20201228.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-10074 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-100742024-08-01T15:24:01Z Optimizing expectation with guarantees in POMDPs CHATTERJEE, Krishnendu PEREZ, Guillermo A. RASKIN, Jean-François ZIKELIC, Dorde A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications. Recently, there has been a surge of interest in POMDPs where the goal is to maximize the probability to ensure that the payoff is at least a given threshold, but these approaches do not consider any optimization beyond satisfying this threshold constraint. In this work we go beyond both the "expectation" and "threshold" approaches and consider a "guaranteed payoff optimization (GPO)" problem for POMDPs, where we are given a threshold t and the objective is to find a policy σ such that a) each possible outcome of σ yields a discounted-sum payoff of at least t, and b) the expected discounted-sum payoff of σ is optimal (or near-optimal) among all policies satisfying a). We present a practical approach to tackle the GPO problem and evaluate it on standard POMDP benchmarks. 2017-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9071 info:doi/10.5555/3298023.3298109 https://ink.library.smu.edu.sg/context/sis_research/article/10074/viewcontent/11046_13_14574_1_2_20201228.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Artificial Intelligence and Robotics |
spellingShingle |
Artificial Intelligence and Robotics CHATTERJEE, Krishnendu PEREZ, Guillermo A. RASKIN, Jean-François ZIKELIC, Dorde Optimizing expectation with guarantees in POMDPs |
description |
A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications. Recently, there has been a surge of interest in POMDPs where the goal is to maximize the probability to ensure that the payoff is at least a given threshold, but these approaches do not consider any optimization beyond satisfying this threshold constraint. In this work we go beyond both the "expectation" and "threshold" approaches and consider a "guaranteed payoff optimization (GPO)" problem for POMDPs, where we are given a threshold t and the objective is to find a policy σ such that a) each possible outcome of σ yields a discounted-sum payoff of at least t, and b) the expected discounted-sum payoff of σ is optimal (or near-optimal) among all policies satisfying a). We present a practical approach to tackle the GPO problem and evaluate it on standard POMDP benchmarks. |
format |
text |
author |
CHATTERJEE, Krishnendu PEREZ, Guillermo A. RASKIN, Jean-François ZIKELIC, Dorde |
author_facet |
CHATTERJEE, Krishnendu PEREZ, Guillermo A. RASKIN, Jean-François ZIKELIC, Dorde |
author_sort |
CHATTERJEE, Krishnendu |
title |
Optimizing expectation with guarantees in POMDPs |
title_short |
Optimizing expectation with guarantees in POMDPs |
title_full |
Optimizing expectation with guarantees in POMDPs |
title_fullStr |
Optimizing expectation with guarantees in POMDPs |
title_full_unstemmed |
Optimizing expectation with guarantees in POMDPs |
title_sort |
optimizing expectation with guarantees in pomdps |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2017 |
url |
https://ink.library.smu.edu.sg/sis_research/9071 https://ink.library.smu.edu.sg/context/sis_research/article/10074/viewcontent/11046_13_14574_1_2_20201228.pdf |
_version_ |
1814047723571118080 |