End-to-end hierarchical reinforcement learning with integrated subgoal discovery

Hierarchical reinforcement learning (HRL) is a promising approach to perform long-horizon goal-reaching tasks by decomposing the goals into subgoals. In a holistic HRL paradigm, an agent must autonomously discover such subgoals and also learn a hierarchy of policies that uses them to reach the goals...

Full description

Saved in:

Bibliographic Details
Main Authors:	PATERIA, Shubham, SUBAGDJA, Budhitama, TAN, Ah-hwee, QUEK, Chai
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Hierarchical reinforcement learning (HRL) reinforcement learning subgoal discovery task analysis Artificial Intelligence and Robotics Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/6416 https://ink.library.smu.edu.sg/context/sis_research/article/7419/viewcontent/End_to_End_Hierarchical_Reinforcement_Learning___IEEE_TNNLS_2021__Preprint_.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7419
record_format	dspace
spelling	sg-smu-ink.sis_research-74192024-03-20T05:30:36Z End-to-end hierarchical reinforcement learning with integrated subgoal discovery PATERIA, Shubham SUBAGDJA, Budhitama TAN, Ah-hwee QUEK, Chai Hierarchical reinforcement learning (HRL) is a promising approach to perform long-horizon goal-reaching tasks by decomposing the goals into subgoals. In a holistic HRL paradigm, an agent must autonomously discover such subgoals and also learn a hierarchy of policies that uses them to reach the goals. Recently introduced end-to-end HRL methods accomplish this by using the higher-level policy in the hierarchy to directly search the useful subgoals in a continuous subgoal space. However, learning such a policy may be challenging when the subgoal space is large. We propose integrated discovery of salient subgoals (LIDOSS), an end-to-end HRL method with an integrated subgoal discovery heuristic that reduces the search space of the higher-level policy, by explicitly focusing on the subgoals that have a greater probability of occurrence on various state-transition trajectories leading to the goal. We evaluate LIDOSS on a set of continuous control tasks in the MuJoCo domain against hierarchical actor critic (HAC), a state-of-the-art end-to-end HRL method. The results show that LIDOSS attains better goal achievement rates than HAC in most of the tasks. 2022-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6416 info:doi/10.1109/TNNLS.2021.3087733 https://ink.library.smu.edu.sg/context/sis_research/article/7419/viewcontent/End_to_End_Hierarchical_Reinforcement_Learning___IEEE_TNNLS_2021__Preprint_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Hierarchical reinforcement learning (HRL) reinforcement learning subgoal discovery task analysis Artificial Intelligence and Robotics Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Hierarchical reinforcement learning (HRL) reinforcement learning subgoal discovery task analysis Artificial Intelligence and Robotics Databases and Information Systems
spellingShingle	Hierarchical reinforcement learning (HRL) reinforcement learning subgoal discovery task analysis Artificial Intelligence and Robotics Databases and Information Systems PATERIA, Shubham SUBAGDJA, Budhitama TAN, Ah-hwee QUEK, Chai End-to-end hierarchical reinforcement learning with integrated subgoal discovery
description	Hierarchical reinforcement learning (HRL) is a promising approach to perform long-horizon goal-reaching tasks by decomposing the goals into subgoals. In a holistic HRL paradigm, an agent must autonomously discover such subgoals and also learn a hierarchy of policies that uses them to reach the goals. Recently introduced end-to-end HRL methods accomplish this by using the higher-level policy in the hierarchy to directly search the useful subgoals in a continuous subgoal space. However, learning such a policy may be challenging when the subgoal space is large. We propose integrated discovery of salient subgoals (LIDOSS), an end-to-end HRL method with an integrated subgoal discovery heuristic that reduces the search space of the higher-level policy, by explicitly focusing on the subgoals that have a greater probability of occurrence on various state-transition trajectories leading to the goal. We evaluate LIDOSS on a set of continuous control tasks in the MuJoCo domain against hierarchical actor critic (HAC), a state-of-the-art end-to-end HRL method. The results show that LIDOSS attains better goal achievement rates than HAC in most of the tasks.
format	text
author	PATERIA, Shubham SUBAGDJA, Budhitama TAN, Ah-hwee QUEK, Chai
author_facet	PATERIA, Shubham SUBAGDJA, Budhitama TAN, Ah-hwee QUEK, Chai
author_sort	PATERIA, Shubham
title	End-to-end hierarchical reinforcement learning with integrated subgoal discovery
title_short	End-to-end hierarchical reinforcement learning with integrated subgoal discovery
title_full	End-to-end hierarchical reinforcement learning with integrated subgoal discovery
title_fullStr	End-to-end hierarchical reinforcement learning with integrated subgoal discovery
title_full_unstemmed	End-to-end hierarchical reinforcement learning with integrated subgoal discovery
title_sort	end-to-end hierarchical reinforcement learning with integrated subgoal discovery
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/6416 https://ink.library.smu.edu.sg/context/sis_research/article/7419/viewcontent/End_to_End_Hierarchical_Reinforcement_Learning___IEEE_TNNLS_2021__Preprint_.pdf
_version_	1794549873425514496

End-to-end hierarchical reinforcement learning with integrated subgoal discovery

Similar Items