Bootstrapping simulation-based algorithms with a suboptimal policy

Finding optimal policies for Markov Decision Processes with large state spaces is in general intractable. Nonetheless, simulation-based algorithms inspired by Sparse Sampling (SS) such as Upper Confidence Bound applied in Trees (UCT) and Forward Search Sparse Sampling (FSSS) have been shown to perfo...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nguyen T., Silander T., Lee W., Tze-Yun LEONG
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2014
Subjects:	markov decision process sparse sampling forward sparse sampling uct heuristic Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/3000 https://ink.library.smu.edu.sg/context/sis_research/article/4000/viewcontent/7934_37003_2_PB.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-4000
record_format	dspace
spelling	sg-smu-ink.sis_research-40002018-07-13T04:35:00Z Bootstrapping simulation-based algorithms with a suboptimal policy Nguyen T., Silander T., Lee W., Tze-Yun LEONG, Finding optimal policies for Markov Decision Processes with large state spaces is in general intractable. Nonetheless, simulation-based algorithms inspired by Sparse Sampling (SS) such as Upper Confidence Bound applied in Trees (UCT) and Forward Search Sparse Sampling (FSSS) have been shown to perform reasonably well in both theory and practice, despite the high computational demand. To improve the efficiency of these algorithms, we adopt a simple enhancement technique with a heuristic policy to speed up the selection of optimal actions. The general method, called Aux, augments the look-ahead tree with auxiliary arms that are evaluated by the heuristic policy. In this paper, we provide theoretical justification for the method and demonstrate its effectiveness in two experimental benchmarks that showcase the faster convergence to a near optimal policy for both SS and FSSS. Moreover, to further speed up the convergence of these algorithms at the early stage, we present a novel mechanism to combine them with UCT so that the resulting hybrid algorithm is superior to both of its components. 2014-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3000 https://ink.library.smu.edu.sg/context/sis_research/article/4000/viewcontent/7934_37003_2_PB.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University markov decision process sparse sampling forward sparse sampling uct heuristic Theory and Algorithms
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	markov decision process sparse sampling forward sparse sampling uct heuristic Theory and Algorithms
spellingShingle	markov decision process sparse sampling forward sparse sampling uct heuristic Theory and Algorithms Nguyen T., Silander T., Lee W., Tze-Yun LEONG, Bootstrapping simulation-based algorithms with a suboptimal policy
description	Finding optimal policies for Markov Decision Processes with large state spaces is in general intractable. Nonetheless, simulation-based algorithms inspired by Sparse Sampling (SS) such as Upper Confidence Bound applied in Trees (UCT) and Forward Search Sparse Sampling (FSSS) have been shown to perform reasonably well in both theory and practice, despite the high computational demand. To improve the efficiency of these algorithms, we adopt a simple enhancement technique with a heuristic policy to speed up the selection of optimal actions. The general method, called Aux, augments the look-ahead tree with auxiliary arms that are evaluated by the heuristic policy. In this paper, we provide theoretical justification for the method and demonstrate its effectiveness in two experimental benchmarks that showcase the faster convergence to a near optimal policy for both SS and FSSS. Moreover, to further speed up the convergence of these algorithms at the early stage, we present a novel mechanism to combine them with UCT so that the resulting hybrid algorithm is superior to both of its components.
format	text
author	Nguyen T., Silander T., Lee W., Tze-Yun LEONG,
author_facet	Nguyen T., Silander T., Lee W., Tze-Yun LEONG,
author_sort	Nguyen T.,
title	Bootstrapping simulation-based algorithms with a suboptimal policy
title_short	Bootstrapping simulation-based algorithms with a suboptimal policy
title_full	Bootstrapping simulation-based algorithms with a suboptimal policy
title_fullStr	Bootstrapping simulation-based algorithms with a suboptimal policy
title_full_unstemmed	Bootstrapping simulation-based algorithms with a suboptimal policy
title_sort	bootstrapping simulation-based algorithms with a suboptimal policy
publisher	Institutional Knowledge at Singapore Management University
publishDate	2014
url	https://ink.library.smu.edu.sg/sis_research/3000 https://ink.library.smu.edu.sg/context/sis_research/article/4000/viewcontent/7934_37003_2_PB.pdf
_version_	1770572775221624832

Bootstrapping simulation-based algorithms with a suboptimal policy

Similar Items