Action selection for composable modular deep reinforcement learning

In modular reinforcement learning (MRL), a complex decision making problem is decomposed into multiple simpler subproblems each solved by a separate module. Often, these subproblems have conflicting goals, and incomparable reward scales. A composable decision making architecture requires that even t...

Full description

Saved in:
Bibliographic Details
Main Authors: GUPTA, Vaibhav, ANAND, Daksh, PARACHURI, Praveen, KUMAR, Akshat
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6179
https://ink.library.smu.edu.sg/context/sis_research/article/7182/viewcontent/AAMAS_2021.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7182
record_format dspace
spelling sg-smu-ink.sis_research-71822021-09-29T10:19:09Z Action selection for composable modular deep reinforcement learning GUPTA, Vaibhav ANAND, Daksh PARACHURI, Praveen KUMAR, Akshat In modular reinforcement learning (MRL), a complex decision making problem is decomposed into multiple simpler subproblems each solved by a separate module. Often, these subproblems have conflicting goals, and incomparable reward scales. A composable decision making architecture requires that even the modules authored separately with possibly misaligned reward scales can be combined coherently. An arbitrator should consider different module’s action preferences to learn effective global action selection. We present a novel framework called GRACIAS that assigns fine-grained importance to the different modules based on their relevance in a given state, and enables composable decision making based on modern deep RL methods such as deep deterministic policy gradient (DDPG) and deep Q-learning. We provide insights into the convergence properties of GRACIAS and also show that previous MRL algorithms reduce to special cases of our framework. We experimentally demonstrate on several standard MRL domains that our approach works significantly better than the previous MRL methods, and is highly robust to incomparable reward scales. Our framework extends MRL to complex Atari games such as Qbert, and has a better learning curve than the conventional RL algorithms. 2021-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6179 https://ink.library.smu.edu.sg/context/sis_research/article/7182/viewcontent/AAMAS_2021.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Reinforcement Learning Coordination and Control Deep Learning Theory and Algorithms
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Reinforcement Learning
Coordination and Control
Deep Learning
Theory and Algorithms
spellingShingle Reinforcement Learning
Coordination and Control
Deep Learning
Theory and Algorithms
GUPTA, Vaibhav
ANAND, Daksh
PARACHURI, Praveen
KUMAR, Akshat
Action selection for composable modular deep reinforcement learning
description In modular reinforcement learning (MRL), a complex decision making problem is decomposed into multiple simpler subproblems each solved by a separate module. Often, these subproblems have conflicting goals, and incomparable reward scales. A composable decision making architecture requires that even the modules authored separately with possibly misaligned reward scales can be combined coherently. An arbitrator should consider different module’s action preferences to learn effective global action selection. We present a novel framework called GRACIAS that assigns fine-grained importance to the different modules based on their relevance in a given state, and enables composable decision making based on modern deep RL methods such as deep deterministic policy gradient (DDPG) and deep Q-learning. We provide insights into the convergence properties of GRACIAS and also show that previous MRL algorithms reduce to special cases of our framework. We experimentally demonstrate on several standard MRL domains that our approach works significantly better than the previous MRL methods, and is highly robust to incomparable reward scales. Our framework extends MRL to complex Atari games such as Qbert, and has a better learning curve than the conventional RL algorithms.
format text
author GUPTA, Vaibhav
ANAND, Daksh
PARACHURI, Praveen
KUMAR, Akshat
author_facet GUPTA, Vaibhav
ANAND, Daksh
PARACHURI, Praveen
KUMAR, Akshat
author_sort GUPTA, Vaibhav
title Action selection for composable modular deep reinforcement learning
title_short Action selection for composable modular deep reinforcement learning
title_full Action selection for composable modular deep reinforcement learning
title_fullStr Action selection for composable modular deep reinforcement learning
title_full_unstemmed Action selection for composable modular deep reinforcement learning
title_sort action selection for composable modular deep reinforcement learning
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/6179
https://ink.library.smu.edu.sg/context/sis_research/article/7182/viewcontent/AAMAS_2021.pdf
_version_ 1770575842844344320