Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs)

Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models a...

Full description

Saved in:

Bibliographic Details
Main Authors:	AHMED, Asrar, VARAKANTHAM, Pradeep, LOWALEKAR, Meghna, ADULYASAK, Yossiri, JAILLET, Patrick
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2017
Subjects:	Artificial Intelligence and Robotics Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/3937 https://ink.library.smu.edu.sg/context/sis_research/article/4939/viewcontent/Sampling_based_approach_regret_MDP_JAIR_pv.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-4939
record_format	dspace
spelling	sg-smu-ink.sis_research-49392020-03-25T08:51:08Z Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs) AHMED, Asrar VARAKANTHAM, Pradeep LOWALEKAR, Meghna ADULYASAK, Yossiri JAILLET, Patrick Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies have focused on the computation of maximin policies which maximize the value in the worst case amongst all realisations of uncertainty. Given the overly conservative nature of maximin policies, recent work has proposed minimax regret as an ideal alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only and they are also limited in their scalability. Therefore, we provide a general model of uncertain MDPs that considers uncertainty over both transition and reward functions. Furthermore, we also consider dependence of the uncertainty across different states and decision epochs. We also provide a mixed integer linear program formulation for minimizing regret given a set of samples of the transition and reward functions in the uncertain MDP. In addition, we provide two myopic variants of regret, namely Cumulative Expected Myopic Regret (CEMR) and One Step Regret (OSR) that can be optimized in a scalable manner. Specifically, we provide dynamic programming and policy iteration based algorithms to optimize CEMR and OSR respectively. Finally, to demonstrate the effectiveness of our approaches, we provide comparisons on two benchmark problems from literature. We observe that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret. 2017-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3937 info:doi/10.1613/jair.5242 https://ink.library.smu.edu.sg/context/sis_research/article/4939/viewcontent/Sampling_based_approach_regret_MDP_JAIR_pv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics Theory and Algorithms
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Artificial Intelligence and Robotics Theory and Algorithms
spellingShingle	Artificial Intelligence and Robotics Theory and Algorithms AHMED, Asrar VARAKANTHAM, Pradeep LOWALEKAR, Meghna ADULYASAK, Yossiri JAILLET, Patrick Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs)
description	Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies have focused on the computation of maximin policies which maximize the value in the worst case amongst all realisations of uncertainty. Given the overly conservative nature of maximin policies, recent work has proposed minimax regret as an ideal alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only and they are also limited in their scalability. Therefore, we provide a general model of uncertain MDPs that considers uncertainty over both transition and reward functions. Furthermore, we also consider dependence of the uncertainty across different states and decision epochs. We also provide a mixed integer linear program formulation for minimizing regret given a set of samples of the transition and reward functions in the uncertain MDP. In addition, we provide two myopic variants of regret, namely Cumulative Expected Myopic Regret (CEMR) and One Step Regret (OSR) that can be optimized in a scalable manner. Specifically, we provide dynamic programming and policy iteration based algorithms to optimize CEMR and OSR respectively. Finally, to demonstrate the effectiveness of our approaches, we provide comparisons on two benchmark problems from literature. We observe that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret.
format	text
author	AHMED, Asrar VARAKANTHAM, Pradeep LOWALEKAR, Meghna ADULYASAK, Yossiri JAILLET, Patrick
author_facet	AHMED, Asrar VARAKANTHAM, Pradeep LOWALEKAR, Meghna ADULYASAK, Yossiri JAILLET, Patrick
author_sort	AHMED, Asrar
title	Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs)
title_short	Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs)
title_full	Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs)
title_fullStr	Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs)
title_full_unstemmed	Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs)
title_sort	sampling based approaches for minimizing regret in uncertain markov decision problems (mdps)
publisher	Institutional Knowledge at Singapore Management University
publishDate	2017
url	https://ink.library.smu.edu.sg/sis_research/3937 https://ink.library.smu.edu.sg/context/sis_research/article/4939/viewcontent/Sampling_based_approach_regret_MDP_JAIR_pv.pdf
_version_	1770573998590001152

Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs)

Similar Items