Action selection for composable modular deep reinforcement learning

In modular reinforcement learning (MRL), a complex decision making problem is decomposed into multiple simpler subproblems each solved by a separate module. Often, these subproblems have conflicting goals, and incomparable reward scales. A composable decision making architecture requires that even t...

全面介紹

Saved in:
書目詳細資料
Main Authors: GUPTA, Vaibhav, ANAND, Daksh, PARUCHURI, Praveen, KUMAR, Akshat
格式: text
語言:English
出版: Institutional Knowledge at Singapore Management University 2021
主題:
在線閱讀:https://ink.library.smu.edu.sg/sis_research/6900
https://ink.library.smu.edu.sg/context/sis_research/article/7903/viewcontent/Action_Selection_for_Composable_Modular_Deep_Reinforcement_Learning.pdf
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:In modular reinforcement learning (MRL), a complex decision making problem is decomposed into multiple simpler subproblems each solved by a separate module. Often, these subproblems have conflicting goals, and incomparable reward scales. A composable decision making architecture requires that even the modules authored separately with possibly misaligned reward scales can be combined coherently. An arbitrator should consider different module's action preferences to learn effective global action selection. We present a novel framework called GRACIAS that assigns fine-grained importance to the different modules based on their relevance in a given state, and enables composable decision making based on modern deep RL methods such as deep deterministic policy gradient (DDPG) and deep Q-learning. We provide insights into the convergence properties of GRACIAS and also show that previous MRL algorithms reduce to special cases of our framework. We experimentally demonstrate on several standard MRL domains that our approach works significantly better than the previous MRL methods, and is highly robust to incomparable reward scales. Our framework extends MRL to complex Atari games such as Qbert, and has a better learning curve than the conventional RL algorithms.