Action selection for composable modular deep reinforcement learning

In modular reinforcement learning (MRL), a complex decision making problem is decomposed into multiple simpler subproblems each solved by a separate module. Often, these subproblems have conflicting goals, and incomparable reward scales. A composable decision making architecture requires that even t...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	GUPTA, Vaibhav, ANAND, Daksh, PARUCHURI, Praveen, KUMAR, Akshat
التنسيق:	text
اللغة:	English
منشور في:	Institutional Knowledge at Singapore Management University 2021
الموضوعات:	Reinforcement learning; Coordination and control; Deep learning Databases and Information Systems
الوصول للمادة أونلاين:	https://ink.library.smu.edu.sg/sis_research/6900 https://ink.library.smu.edu.sg/context/sis_research/article/7903/viewcontent/Action_Selection_for_Composable_Modular_Deep_Reinforcement_Learning.pdf
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Singapore Management University
اللغة:	English

id	sg-smu-ink.sis_research-7903
record_format	dspace
spelling	sg-smu-ink.sis_research-79032022-02-07T10:51:58Z Action selection for composable modular deep reinforcement learning GUPTA, Vaibhav ANAND, Daksh PARUCHURI, Praveen KUMAR, Akshat In modular reinforcement learning (MRL), a complex decision making problem is decomposed into multiple simpler subproblems each solved by a separate module. Often, these subproblems have conflicting goals, and incomparable reward scales. A composable decision making architecture requires that even the modules authored separately with possibly misaligned reward scales can be combined coherently. An arbitrator should consider different module's action preferences to learn effective global action selection. We present a novel framework called GRACIAS that assigns fine-grained importance to the different modules based on their relevance in a given state, and enables composable decision making based on modern deep RL methods such as deep deterministic policy gradient (DDPG) and deep Q-learning. We provide insights into the convergence properties of GRACIAS and also show that previous MRL algorithms reduce to special cases of our framework. We experimentally demonstrate on several standard MRL domains that our approach works significantly better than the previous MRL methods, and is highly robust to incomparable reward scales. Our framework extends MRL to complex Atari games such as Qbert, and has a better learning curve than the conventional RL algorithms. 2021-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6900 https://ink.library.smu.edu.sg/context/sis_research/article/7903/viewcontent/Action_Selection_for_Composable_Modular_Deep_Reinforcement_Learning.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Reinforcement learning; Coordination and control; Deep learning Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Reinforcement learning; Coordination and control; Deep learning Databases and Information Systems
spellingShingle	Reinforcement learning; Coordination and control; Deep learning Databases and Information Systems GUPTA, Vaibhav ANAND, Daksh PARUCHURI, Praveen KUMAR, Akshat Action selection for composable modular deep reinforcement learning
description	In modular reinforcement learning (MRL), a complex decision making problem is decomposed into multiple simpler subproblems each solved by a separate module. Often, these subproblems have conflicting goals, and incomparable reward scales. A composable decision making architecture requires that even the modules authored separately with possibly misaligned reward scales can be combined coherently. An arbitrator should consider different module's action preferences to learn effective global action selection. We present a novel framework called GRACIAS that assigns fine-grained importance to the different modules based on their relevance in a given state, and enables composable decision making based on modern deep RL methods such as deep deterministic policy gradient (DDPG) and deep Q-learning. We provide insights into the convergence properties of GRACIAS and also show that previous MRL algorithms reduce to special cases of our framework. We experimentally demonstrate on several standard MRL domains that our approach works significantly better than the previous MRL methods, and is highly robust to incomparable reward scales. Our framework extends MRL to complex Atari games such as Qbert, and has a better learning curve than the conventional RL algorithms.
format	text
author	GUPTA, Vaibhav ANAND, Daksh PARUCHURI, Praveen KUMAR, Akshat
author_facet	GUPTA, Vaibhav ANAND, Daksh PARUCHURI, Praveen KUMAR, Akshat
author_sort	GUPTA, Vaibhav
title	Action selection for composable modular deep reinforcement learning
title_short	Action selection for composable modular deep reinforcement learning
title_full	Action selection for composable modular deep reinforcement learning
title_fullStr	Action selection for composable modular deep reinforcement learning
title_full_unstemmed	Action selection for composable modular deep reinforcement learning
title_sort	action selection for composable modular deep reinforcement learning
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/6900 https://ink.library.smu.edu.sg/context/sis_research/article/7903/viewcontent/Action_Selection_for_Composable_Modular_Deep_Reinforcement_Learning.pdf
_version_	1770576116277313536

Action selection for composable modular deep reinforcement learning

مواد مشابهة