Improving deep reinforcement learning with advanced exploration and transfer learning techniques
Deep reinforcement learning utilizes deep neural networks as the function approximator to model the reinforcement learning policy and enables the policy to be trained in an end-to-end manner. When applied to complex real world problems such as video games playing and natural language processing, the...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/137772 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Deep reinforcement learning utilizes deep neural networks as the function approximator to model the reinforcement learning policy and enables the policy to be trained in an end-to-end manner. When applied to complex real world problems such as video games playing and natural language processing, the deep reinforcement learning algorithms often engage tremendous parameters with intractable search space, which is a result from the low-level modelling of state space or the complex nature of the problem. Therefore, constructing an effective exploration strategy to search through the solution space is crucial for deriving a policy that can tackle challenging problems. Furthermore, considering the considerable amount of computational resource and time consumed for policy training, it is also crucial to develop the transferability of the algorithm to create versatile and generalizable policy.
In this thesis, I present a study on improving the deep reinforcement learning algorithms from the perspectives of exploration and transfer learning. The study of exploration mainly focuses on solving hard exploration problems in Atari 2600 games suite and the partially observable navigation domains with extremely sparse rewards. The following three exploration algorithms are discussed: a planning-based algorithm with deep hashing techniques to improve the search efficiency, a distributed framework with an exploration incentivizing novelty model to increase the sample throughput while gathering more novel experiences, and a sequence-level novelty model designated for sparse rewarded partially observable domains. With the attempt to improve the generalization ability of the policy, I discuss two policy transfer algorithms, which work on multi-task policy distillation and zero-shot policy transfer tasks, respectively.
The above mentioned study has been evaluated in video games playing domains with high dimensional pixel-level inputs. The testified domains consist of Atari 2600 games suite, ViZDoom and DeepMind Lab. As a result, the presented approaches demonstrate desirable properties for improving the policy performance with the advanced exploration or transfer learning mechanism. Finally, I conclude by discussing open questions and future directions in applying the presented exploration and transfer learning techniques in more general and practical scenarios. |
---|