Integrating temporal difference methods and self‐organizing neural networks for reinforcement learning with delayed evaluative feedback

This paper presents a neural architecture for learning category nodes encoding mappings across multimodal patterns involving sensory inputs, actions, and rewards. By integrating adaptive resonance theory (ART) and temporal difference (TD) methods, the proposed neural model, called TD fusion architec...

Full description

Saved in:

Bibliographic Details
Main Authors:	TAN, Ah-hwee, LU, Ning, XIAO, Dan
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2008
Subjects:	Reinforcement learning self-organizing neural networks (NNs) temporal difference (TD) methods Computer Engineering Databases and Information Systems OS and Networks
Online Access:	https://ink.library.smu.edu.sg/sis_research/5237 https://ink.library.smu.edu.sg/context/sis_research/article/6240/viewcontent/Integrating_Temporal_Difference_Methods_and.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-6240
record_format	dspace
spelling	sg-smu-ink.sis_research-62402020-07-23T18:25:50Z Integrating temporal difference methods and self‐organizing neural networks for reinforcement learning with delayed evaluative feedback TAN, Ah-hwee LU, Ning XIAO, Dan This paper presents a neural architecture for learning category nodes encoding mappings across multimodal patterns involving sensory inputs, actions, and rewards. By integrating adaptive resonance theory (ART) and temporal difference (TD) methods, the proposed neural model, called TD fusion architecture for learning, cognition, and navigation (TD-FALCON), enables an autonomous agent to adapt and function in a dynamic environment with immediate as well as delayed evaluative feedback (reinforcement) signals. TD-FALCON learns the value functions of the state-action space estimated through on-policy and off-policy TD learning methods, specifically state-action-reward-state-action (SARSA) and Q-learning. The learned value functions are then used to determine the optimal actions based on an action selection policy. We have developed TD-FALCON systems using various TD learning strategies and compared their performance in terms of task completion, learning speed, as well as time and space efficiency. Experiments based on a minefield navigation task have shown that TD-FALCON systems are able to learn effectively with both immediate and delayed reinforcement and achieve a stable performance in a pace much faster than those of standard gradient-descent-based reinforcement learning systems. 2008-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5237 info:doi/10.1109/TNN.2007.905839 https://ink.library.smu.edu.sg/context/sis_research/article/6240/viewcontent/Integrating_Temporal_Difference_Methods_and.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Reinforcement learning self-organizing neural networks (NNs) temporal difference (TD) methods Computer Engineering Databases and Information Systems OS and Networks
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Reinforcement learning self-organizing neural networks (NNs) temporal difference (TD) methods Computer Engineering Databases and Information Systems OS and Networks
spellingShingle	Reinforcement learning self-organizing neural networks (NNs) temporal difference (TD) methods Computer Engineering Databases and Information Systems OS and Networks TAN, Ah-hwee LU, Ning XIAO, Dan Integrating temporal difference methods and self‐organizing neural networks for reinforcement learning with delayed evaluative feedback
description	This paper presents a neural architecture for learning category nodes encoding mappings across multimodal patterns involving sensory inputs, actions, and rewards. By integrating adaptive resonance theory (ART) and temporal difference (TD) methods, the proposed neural model, called TD fusion architecture for learning, cognition, and navigation (TD-FALCON), enables an autonomous agent to adapt and function in a dynamic environment with immediate as well as delayed evaluative feedback (reinforcement) signals. TD-FALCON learns the value functions of the state-action space estimated through on-policy and off-policy TD learning methods, specifically state-action-reward-state-action (SARSA) and Q-learning. The learned value functions are then used to determine the optimal actions based on an action selection policy. We have developed TD-FALCON systems using various TD learning strategies and compared their performance in terms of task completion, learning speed, as well as time and space efficiency. Experiments based on a minefield navigation task have shown that TD-FALCON systems are able to learn effectively with both immediate and delayed reinforcement and achieve a stable performance in a pace much faster than those of standard gradient-descent-based reinforcement learning systems.
format	text
author	TAN, Ah-hwee LU, Ning XIAO, Dan
author_facet	TAN, Ah-hwee LU, Ning XIAO, Dan
author_sort	TAN, Ah-hwee
title	Integrating temporal difference methods and self‐organizing neural networks for reinforcement learning with delayed evaluative feedback
title_short	Integrating temporal difference methods and self‐organizing neural networks for reinforcement learning with delayed evaluative feedback
title_full	Integrating temporal difference methods and self‐organizing neural networks for reinforcement learning with delayed evaluative feedback
title_fullStr	Integrating temporal difference methods and self‐organizing neural networks for reinforcement learning with delayed evaluative feedback
title_full_unstemmed	Integrating temporal difference methods and self‐organizing neural networks for reinforcement learning with delayed evaluative feedback
title_sort	integrating temporal difference methods and self‐organizing neural networks for reinforcement learning with delayed evaluative feedback
publisher	Institutional Knowledge at Singapore Management University
publishDate	2008
url	https://ink.library.smu.edu.sg/sis_research/5237 https://ink.library.smu.edu.sg/context/sis_research/article/6240/viewcontent/Integrating_Temporal_Difference_Methods_and.pdf
_version_	1770575345308663808

Integrating temporal difference methods and self‐organizing neural networks for reinforcement learning with delayed evaluative feedback

Similar Items