Self-regulating action exploration in reinforcement learning

The basic tenet of a learning process is for an agent to learn for only as much and as long as it is necessary. With reinforcement learning, the learning process is divided between exploration and exploitation. Given the complexity of the problem domain and the randomness of the learning process, th...

全面介紹

Saved in:

書目詳細資料
Main Authors:	TENG, Teck-Hou, TAN, Ah-hwee
格式:	text
語言:	English
出版:	Institutional Knowledge at Singapore Management University 2012
主題:	reinforcement learning exploration-exploitation dilemma k-armed bandit pursuit-evasion self-organizing neuralnetwork Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
在線閱讀:	https://ink.library.smu.edu.sg/sis_research/6282 https://ink.library.smu.edu.sg/context/sis_research/article/7285/viewcontent/Knowledge_based_Exploration___IAT_2012__1_.pdf
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

id	sg-smu-ink.sis_research-7285
record_format	dspace
spelling	sg-smu-ink.sis_research-72852021-11-23T07:58:22Z Self-regulating action exploration in reinforcement learning TENG, Teck-Hou TAN, Ah-hwee The basic tenet of a learning process is for an agent to learn for only as much and as long as it is necessary. With reinforcement learning, the learning process is divided between exploration and exploitation. Given the complexity of the problem domain and the randomness of the learning process, the exact duration of the reinforcement learning process can never be known with certainty. Using an inaccurate number of training iterations leads either to the non-convergence or the over-training of the learning agent. This work addresses such issues by proposing a technique to self-regulate the exploration rate and training duration leading to convergence efficiently. The idea originates from an intuitive understanding that exploration is only necessary when the success rate is low. This means the rate of exploration should be conducted in inverse proportion to the rate of success. In addition, the change in exploration-exploitation rates alters the duration of the learning process. Using this approach, the duration of the learning process becomes adaptive to the updated status of the learning process. Experimental results from the K-Armed Bandit and Air Combat Maneuver scenario prove that optimal action policies can be discovered using the right amount of training iterations. In essence, the proposed method eliminates the guesswork on the amount of exploration needed during reinforcement learning. 2012-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6282 info:doi/10.1016/j.procs.2012.09.110 https://ink.library.smu.edu.sg/context/sis_research/article/7285/viewcontent/Knowledge_based_Exploration___IAT_2012__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University reinforcement learning exploration-exploitation dilemma k-armed bandit pursuit-evasion self-organizing neuralnetwork Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	reinforcement learning exploration-exploitation dilemma k-armed bandit pursuit-evasion self-organizing neuralnetwork Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
spellingShingle	reinforcement learning exploration-exploitation dilemma k-armed bandit pursuit-evasion self-organizing neuralnetwork Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing TENG, Teck-Hou TAN, Ah-hwee Self-regulating action exploration in reinforcement learning
description	The basic tenet of a learning process is for an agent to learn for only as much and as long as it is necessary. With reinforcement learning, the learning process is divided between exploration and exploitation. Given the complexity of the problem domain and the randomness of the learning process, the exact duration of the reinforcement learning process can never be known with certainty. Using an inaccurate number of training iterations leads either to the non-convergence or the over-training of the learning agent. This work addresses such issues by proposing a technique to self-regulate the exploration rate and training duration leading to convergence efficiently. The idea originates from an intuitive understanding that exploration is only necessary when the success rate is low. This means the rate of exploration should be conducted in inverse proportion to the rate of success. In addition, the change in exploration-exploitation rates alters the duration of the learning process. Using this approach, the duration of the learning process becomes adaptive to the updated status of the learning process. Experimental results from the K-Armed Bandit and Air Combat Maneuver scenario prove that optimal action policies can be discovered using the right amount of training iterations. In essence, the proposed method eliminates the guesswork on the amount of exploration needed during reinforcement learning.
format	text
author	TENG, Teck-Hou TAN, Ah-hwee
author_facet	TENG, Teck-Hou TAN, Ah-hwee
author_sort	TENG, Teck-Hou
title	Self-regulating action exploration in reinforcement learning
title_short	Self-regulating action exploration in reinforcement learning
title_full	Self-regulating action exploration in reinforcement learning
title_fullStr	Self-regulating action exploration in reinforcement learning
title_full_unstemmed	Self-regulating action exploration in reinforcement learning
title_sort	self-regulating action exploration in reinforcement learning
publisher	Institutional Knowledge at Singapore Management University
publishDate	2012
url	https://ink.library.smu.edu.sg/sis_research/6282 https://ink.library.smu.edu.sg/context/sis_research/article/7285/viewcontent/Knowledge_based_Exploration___IAT_2012__1_.pdf
_version_	1770575915000004608

Self-regulating action exploration in reinforcement learning

相似書籍