Fast reinforcement learning under uncertainties with self-organizing neural networks

Using feedback signals from the environment, a reinforcement learning (RL) system typically discovers action policies that recommend actions effective to the states based on a Q-value function. However, uncertainties over the estimation of the Q-values can delay the convergence of RL. For fast RL co...

Full description

Saved in:
Bibliographic Details
Main Authors: TENG, Teck-Hou, TAN, Ah-hwee
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2015
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6797
https://ink.library.smu.edu.sg/context/sis_research/article/7800/viewcontent/Fast_RL___WI_IAT_2015.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7800
record_format dspace
spelling sg-smu-ink.sis_research-78002022-01-27T08:34:42Z Fast reinforcement learning under uncertainties with self-organizing neural networks TENG, Teck-Hou TAN, Ah-hwee Using feedback signals from the environment, a reinforcement learning (RL) system typically discovers action policies that recommend actions effective to the states based on a Q-value function. However, uncertainties over the estimation of the Q-values can delay the convergence of RL. For fast RL convergence by accounting for such uncertainties, this paper proposes several enhancements to the estimation and learning of the Q-value using a self-organizing neural network. Specifically, a temporal difference method known as Q-learning is complemented by a Q-value Polarization procedure, which contrasts the Q-values using feedback signals on the effect of the recommended actions. The polarized Q-values are then learned by the self-organizing neural network using a Bi-directional Template Learning procedure. Furthermore, the polarized Q-values are in turn used to adapt the reward vigilance of the ART-based self-organizing neural network using a Bi-directional Adaptation procedure. The efficacy of the resultant system called Fast Learning (FL) FALCON is illustrated using two single-task problem domains with large MDPs. The experiment results from these problem domains unanimously show FL-FALCON converging faster than the compared approaches. 2015-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6797 info:doi/10.1109/WI-IAT.2015.103 https://ink.library.smu.edu.sg/context/sis_research/article/7800/viewcontent/Fast_RL___WI_IAT_2015.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems OS and Networks
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
OS and Networks
spellingShingle Databases and Information Systems
OS and Networks
TENG, Teck-Hou
TAN, Ah-hwee
Fast reinforcement learning under uncertainties with self-organizing neural networks
description Using feedback signals from the environment, a reinforcement learning (RL) system typically discovers action policies that recommend actions effective to the states based on a Q-value function. However, uncertainties over the estimation of the Q-values can delay the convergence of RL. For fast RL convergence by accounting for such uncertainties, this paper proposes several enhancements to the estimation and learning of the Q-value using a self-organizing neural network. Specifically, a temporal difference method known as Q-learning is complemented by a Q-value Polarization procedure, which contrasts the Q-values using feedback signals on the effect of the recommended actions. The polarized Q-values are then learned by the self-organizing neural network using a Bi-directional Template Learning procedure. Furthermore, the polarized Q-values are in turn used to adapt the reward vigilance of the ART-based self-organizing neural network using a Bi-directional Adaptation procedure. The efficacy of the resultant system called Fast Learning (FL) FALCON is illustrated using two single-task problem domains with large MDPs. The experiment results from these problem domains unanimously show FL-FALCON converging faster than the compared approaches.
format text
author TENG, Teck-Hou
TAN, Ah-hwee
author_facet TENG, Teck-Hou
TAN, Ah-hwee
author_sort TENG, Teck-Hou
title Fast reinforcement learning under uncertainties with self-organizing neural networks
title_short Fast reinforcement learning under uncertainties with self-organizing neural networks
title_full Fast reinforcement learning under uncertainties with self-organizing neural networks
title_fullStr Fast reinforcement learning under uncertainties with self-organizing neural networks
title_full_unstemmed Fast reinforcement learning under uncertainties with self-organizing neural networks
title_sort fast reinforcement learning under uncertainties with self-organizing neural networks
publisher Institutional Knowledge at Singapore Management University
publishDate 2015
url https://ink.library.smu.edu.sg/sis_research/6797
https://ink.library.smu.edu.sg/context/sis_research/article/7800/viewcontent/Fast_RL___WI_IAT_2015.pdf
_version_ 1770576070756532224