Offline RL with discrete proxy representations for generalizability in POMDPs

Offline Reinforcement Learning (RL) has demonstrated promising results in various applications by learning policies from previously collected datasets, reducing the need for online exploration and interactions. However, real-world scenarios usually involve partial observability, which brings crucial...

Full description

Saved in:

Bibliographic Details
Main Authors:	GU, Pengjie, CAI, Xinyu, XING, Dong, WANG, Xinrun, ZHAO, Mengchen, AN, Bo
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/9048 https://ink.library.smu.edu.sg/context/sis_research/article/10051/viewcontent/Offline_rl_with_discrete_proxy_av.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10051
record_format	dspace
spelling	sg-smu-ink.sis_research-100512024-07-25T07:40:50Z Offline RL with discrete proxy representations for generalizability in POMDPs GU, Pengjie CAI, Xinyu XING, Dong WANG, Xinrun ZHAO, Mengchen AN, Bo Offline Reinforcement Learning (RL) has demonstrated promising results in various applications by learning policies from previously collected datasets, reducing the need for online exploration and interactions. However, real-world scenarios usually involve partial observability, which brings crucial challenges of the deployment of offline RL methods: i) the policy trained on data with full observability is not robust against the masked observations during execution, and ii) the information of which parts of observations are masked is usually unknown during training. In order to address these challenges, we present Offline RL with DiscrEte pRoxy representations (ORDER), a probabilistic framework which leverages novel state representations to improve the robustness against diverse masked observabilities. Specifically, we propose a discrete representation of the states and use a proxy representation to recover the states from masked partial observable trajectories. The training of ORDER can be compactly described as the following three steps. i) Learning the discrete state representations on data with full observations, ii) Training the decision module based on the discrete representations, and iii) Training the proxy discrete representations on the data with various partial observations, aligning with the discrete representations. We conduct extensive experiments to evaluate ORDER, showcasing its effectiveness in offline RL for diverse partially observable scenarios and highlighting the significance of discrete proxy representations in generalization performance. ORDER is a flexible framework to employ any offline RL algorithms and we hope that ORDER can pave the way for the deployment of RL policy against various partial observabilities in the real world. 2023-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9048 https://ink.library.smu.edu.sg/context/sis_research/article/10051/viewcontent/Offline_rl_with_discrete_proxy_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing Theory and Algorithms
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing Theory and Algorithms
spellingShingle	Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing Theory and Algorithms GU, Pengjie CAI, Xinyu XING, Dong WANG, Xinrun ZHAO, Mengchen AN, Bo Offline RL with discrete proxy representations for generalizability in POMDPs
description	Offline Reinforcement Learning (RL) has demonstrated promising results in various applications by learning policies from previously collected datasets, reducing the need for online exploration and interactions. However, real-world scenarios usually involve partial observability, which brings crucial challenges of the deployment of offline RL methods: i) the policy trained on data with full observability is not robust against the masked observations during execution, and ii) the information of which parts of observations are masked is usually unknown during training. In order to address these challenges, we present Offline RL with DiscrEte pRoxy representations (ORDER), a probabilistic framework which leverages novel state representations to improve the robustness against diverse masked observabilities. Specifically, we propose a discrete representation of the states and use a proxy representation to recover the states from masked partial observable trajectories. The training of ORDER can be compactly described as the following three steps. i) Learning the discrete state representations on data with full observations, ii) Training the decision module based on the discrete representations, and iii) Training the proxy discrete representations on the data with various partial observations, aligning with the discrete representations. We conduct extensive experiments to evaluate ORDER, showcasing its effectiveness in offline RL for diverse partially observable scenarios and highlighting the significance of discrete proxy representations in generalization performance. ORDER is a flexible framework to employ any offline RL algorithms and we hope that ORDER can pave the way for the deployment of RL policy against various partial observabilities in the real world.
format	text
author	GU, Pengjie CAI, Xinyu XING, Dong WANG, Xinrun ZHAO, Mengchen AN, Bo
author_facet	GU, Pengjie CAI, Xinyu XING, Dong WANG, Xinrun ZHAO, Mengchen AN, Bo
author_sort	GU, Pengjie
title	Offline RL with discrete proxy representations for generalizability in POMDPs
title_short	Offline RL with discrete proxy representations for generalizability in POMDPs
title_full	Offline RL with discrete proxy representations for generalizability in POMDPs
title_fullStr	Offline RL with discrete proxy representations for generalizability in POMDPs
title_full_unstemmed	Offline RL with discrete proxy representations for generalizability in POMDPs
title_sort	offline rl with discrete proxy representations for generalizability in pomdps
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/9048 https://ink.library.smu.edu.sg/context/sis_research/article/10051/viewcontent/Offline_rl_with_discrete_proxy_av.pdf
_version_	1814047717466308608

Offline RL with discrete proxy representations for generalizability in POMDPs

Similar Items