Solving Uncertain MDPs with Objectives that are Separable over Instantiations of Model Uncertainty
Markov Decision Problems, MDPs offer an effective mechanism for planning under uncertainty. However, due to unavoidable uncertainty over models, it is difficult to obtain an exact specification of an MDP. We are interested in solving MDPs, where transition and reward functions are not exactly specif...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2015
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/2916 https://ink.library.smu.edu.sg/context/sis_research/article/3916/viewcontent/9843_44958_1_PB.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-3916 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-39162020-03-24T08:21:44Z Solving Uncertain MDPs with Objectives that are Separable over Instantiations of Model Uncertainty ADULYASAK, Yossiri VARAKANTHAM, Pradeep AHMED, Asrar JAILLET, Patrick Markov Decision Problems, MDPs offer an effective mechanism for planning under uncertainty. However, due to unavoidable uncertainty over models, it is difficult to obtain an exact specification of an MDP. We are interested in solving MDPs, where transition and reward functions are not exactly specified. Existing research has primarily focussed on computing infinite horizon stationary policies when optimizing robustness, regret and percentile based objectives. We focus specifically on finite horizon problems with a special emphasis on objectives that are separable over individual instantiations of model uncertainty (i.e., objectives that can be expressed as a sum over instantiations of model uncertainty): (a) First, we identify two separable objectives for uncertain MDPs: Average Value Maximization (AVM) and Confidence Probability Maximisation (CPM). (b) Second, we provide optimization based solutions to compute policies for uncertain MDPs with such objectives. In particular, we exploit the separability of AVM and CPM objectives by employing Lagrangian dual decomposition (LDD). (c) Finally, we demonstrate the utility of the LDD approach on a benchmark problem from the literature. 2015-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2916 https://ink.library.smu.edu.sg/context/sis_research/article/3916/viewcontent/9843_44958_1_PB.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Markov Decision Problems (MDPs) Lagrangian Dual Decomposition Bayesian Reinforcement Learning Robust MDPs Artificial Intelligence and Robotics Computer Sciences Numerical Analysis and Scientific Computing |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Markov Decision Problems (MDPs) Lagrangian Dual Decomposition Bayesian Reinforcement Learning Robust MDPs Artificial Intelligence and Robotics Computer Sciences Numerical Analysis and Scientific Computing |
spellingShingle |
Markov Decision Problems (MDPs) Lagrangian Dual Decomposition Bayesian Reinforcement Learning Robust MDPs Artificial Intelligence and Robotics Computer Sciences Numerical Analysis and Scientific Computing ADULYASAK, Yossiri VARAKANTHAM, Pradeep AHMED, Asrar JAILLET, Patrick Solving Uncertain MDPs with Objectives that are Separable over Instantiations of Model Uncertainty |
description |
Markov Decision Problems, MDPs offer an effective mechanism for planning under uncertainty. However, due to unavoidable uncertainty over models, it is difficult to obtain an exact specification of an MDP. We are interested in solving MDPs, where transition and reward functions are not exactly specified. Existing research has primarily focussed on computing infinite horizon stationary policies when optimizing robustness, regret and percentile based objectives. We focus specifically on finite horizon problems with a special emphasis on objectives that are separable over individual instantiations of model uncertainty (i.e., objectives that can be expressed as a sum over instantiations of model uncertainty): (a) First, we identify two separable objectives for uncertain MDPs: Average Value Maximization (AVM) and Confidence Probability Maximisation (CPM). (b) Second, we provide optimization based solutions to compute policies for uncertain MDPs with such objectives. In particular, we exploit the separability of AVM and CPM objectives by employing Lagrangian dual decomposition (LDD). (c) Finally, we demonstrate the utility of the LDD approach on a benchmark problem from the literature. |
format |
text |
author |
ADULYASAK, Yossiri VARAKANTHAM, Pradeep AHMED, Asrar JAILLET, Patrick |
author_facet |
ADULYASAK, Yossiri VARAKANTHAM, Pradeep AHMED, Asrar JAILLET, Patrick |
author_sort |
ADULYASAK, Yossiri |
title |
Solving Uncertain MDPs with Objectives that are Separable over Instantiations of Model Uncertainty |
title_short |
Solving Uncertain MDPs with Objectives that are Separable over Instantiations of Model Uncertainty |
title_full |
Solving Uncertain MDPs with Objectives that are Separable over Instantiations of Model Uncertainty |
title_fullStr |
Solving Uncertain MDPs with Objectives that are Separable over Instantiations of Model Uncertainty |
title_full_unstemmed |
Solving Uncertain MDPs with Objectives that are Separable over Instantiations of Model Uncertainty |
title_sort |
solving uncertain mdps with objectives that are separable over instantiations of model uncertainty |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2015 |
url |
https://ink.library.smu.edu.sg/sis_research/2916 https://ink.library.smu.edu.sg/context/sis_research/article/3916/viewcontent/9843_44958_1_PB.pdf |
_version_ |
1770572735842353152 |