Hierarchical reinforcement learning with integrated discovery of salient subgoals
Hierarchical Reinforcement Learning (HRL) is a promising approach to solve more complex tasks which may be challenging for the traditional reinforcement learning. HRL achieves this by decomposing a task into shorter-horizon subgoals which are simpler to achieve. Autonomous discovery of such subgoals...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2020
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/6171 https://ink.library.smu.edu.sg/context/sis_research/article/7174/viewcontent/p1963.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-7174 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-71742021-09-29T10:26:23Z Hierarchical reinforcement learning with integrated discovery of salient subgoals PATERIA, Shubham SUBAGDJA, Budhitama TAN, Ah-hwee Hierarchical Reinforcement Learning (HRL) is a promising approach to solve more complex tasks which may be challenging for the traditional reinforcement learning. HRL achieves this by decomposing a task into shorter-horizon subgoals which are simpler to achieve. Autonomous discovery of such subgoals is an important part of HRL. Recently, end-to-end HRL methods have been used to reduce the overhead from offline subgoal discovery by seeking the useful subgoals while simultaneously learning optimal policies in a hierarchy. However, these methods may still suffer from slow learning when the search space used by a high level policy to find the subgoals is large. We propose LIDOSS, an end-to-end HRL method with an integrated heuristic for subgoal discovery. In LIDOSS, the search space of a high level policy can be reduced by focusing only on the subgoal states that have high saliency. We evaluate LIDOSS on continuous control tasks in the MuJoCo Ant domain. The results show that LIDOSS outperforms Hierarchical Actor Critic (HAC), a state-of-the-art HRL method, in the fixed goal tasks. 2020-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6171 info:doi/10.5555/3398761.3399042 https://ink.library.smu.edu.sg/context/sis_research/article/7174/viewcontent/p1963.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Hierarchical Reinforcement Learning Reinforcement Learning Subgoal discovery Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Hierarchical Reinforcement Learning Reinforcement Learning Subgoal discovery Databases and Information Systems |
spellingShingle |
Hierarchical Reinforcement Learning Reinforcement Learning Subgoal discovery Databases and Information Systems PATERIA, Shubham SUBAGDJA, Budhitama TAN, Ah-hwee Hierarchical reinforcement learning with integrated discovery of salient subgoals |
description |
Hierarchical Reinforcement Learning (HRL) is a promising approach to solve more complex tasks which may be challenging for the traditional reinforcement learning. HRL achieves this by decomposing a task into shorter-horizon subgoals which are simpler to achieve. Autonomous discovery of such subgoals is an important part of HRL. Recently, end-to-end HRL methods have been used to reduce the overhead from offline subgoal discovery by seeking the useful subgoals while simultaneously learning optimal policies in a hierarchy. However, these methods may still suffer from slow learning when the search space used by a high level policy to find the subgoals is large. We propose LIDOSS, an end-to-end HRL method with an integrated heuristic for subgoal discovery. In LIDOSS, the search space of a high level policy can be reduced by focusing only on the subgoal states that have high saliency. We evaluate LIDOSS on continuous control tasks in the MuJoCo Ant domain. The results show that LIDOSS outperforms Hierarchical Actor Critic (HAC), a state-of-the-art HRL method, in the fixed goal tasks. |
format |
text |
author |
PATERIA, Shubham SUBAGDJA, Budhitama TAN, Ah-hwee |
author_facet |
PATERIA, Shubham SUBAGDJA, Budhitama TAN, Ah-hwee |
author_sort |
PATERIA, Shubham |
title |
Hierarchical reinforcement learning with integrated discovery of salient subgoals |
title_short |
Hierarchical reinforcement learning with integrated discovery of salient subgoals |
title_full |
Hierarchical reinforcement learning with integrated discovery of salient subgoals |
title_fullStr |
Hierarchical reinforcement learning with integrated discovery of salient subgoals |
title_full_unstemmed |
Hierarchical reinforcement learning with integrated discovery of salient subgoals |
title_sort |
hierarchical reinforcement learning with integrated discovery of salient subgoals |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2020 |
url |
https://ink.library.smu.edu.sg/sis_research/6171 https://ink.library.smu.edu.sg/context/sis_research/article/7174/viewcontent/p1963.pdf |
_version_ |
1770575841457078272 |