Towards Efficient Computation of Quality Bounded Solutions in POMDPs: Expected Value Approximation and Dynamic Disjunctive Beliefs

While POMDPs (partially observable markov decision problems) are a popular computational model with wide-ranging applications, the computational cost for optimal policy generation is prohibitive. Researchers are investigating ever-more efficient algorithms, yet many applications demand such algorith...

Full description

Saved in:
Bibliographic Details
Main Authors: VARAKANTHAM, Pradeep Reddy, Maheswaran, Rajiv, GUPTA, Tapana, Tambe, Milind
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2007
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/956
https://ink.library.smu.edu.sg/context/sis_research/article/1955/viewcontent/IJCAI07.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:While POMDPs (partially observable markov decision problems) are a popular computational model with wide-ranging applications, the computational cost for optimal policy generation is prohibitive. Researchers are investigating ever-more efficient algorithms, yet many applications demand such algorithms bound any loss in policy quality when chasing efficiency. To address this challenge, we present two new techniques. The first approximates in the value space to obtain solutions efficiently for a pre-specified error bound. Unlike existing techniques, our technique guarantees the resulting policy will meet this bound. Furthermore, it does not require costly computations to determine the quality loss of the policy. Our second technique prunes large tracts of belief space that are unreachable, allowing faster policy computation without any sacrifice in optimality. The combination of the two techniques, which are complementary to existing optimal policy generation algorithms, provides solutions with tight error bounds efficiently in domains where competing algorithms fail to provide such tight bounds.