Time-inconsistent objectives in reinforcement learning
In Reinforcement Learning, one of the most intriguing and long-lasting problems is about how to assign credit to historical events efficiently and meaningfully. And within temporal credit assignment problems, time inconsistency is a challenging sub-domain that was noticed long ago but still lacks sy...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/148520 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In Reinforcement Learning, one of the most intriguing and long-lasting problems is about how to assign credit to historical events efficiently and meaningfully. And within temporal credit assignment problems, time inconsistency is a challenging sub-domain that was noticed long ago but still lacks systematic treatment.
The goal of this work is to search for efficient algorithms that converge to equilibrium policies in the presence of time-inconsistent objectives. In this work, we first provide a brief introduction on reinforcement learning and control theory; then, we define the time-inconsistent problem, both illustratively and formally. After that, we propose a general backward update framework based on game theory. This framework is theoretically proven to be able to find the equilibrium control under time-inconsistency. We also review and implement a forward update algorithm that is able to find the equilibrium control in the special case of hyperbolic discounting but has many limitations. The literature review introduces other time-inconsistent situations and algorithms that deal with the efficient temporal credit assignment problem. Finally, we conclude the report and point out the future directions. |
---|