Time-inconsistent objectives in reinforcement learning

In Reinforcement Learning, one of the most intriguing and long-lasting problems is about how to assign credit to historical events efficiently and meaningfully. And within temporal credit assignment problems, time inconsistency is a challenging sub-domain that was noticed long ago but still lacks sy...

Full description

Saved in:

Bibliographic Details
Main Author:	Su, Huangyuan
Other Authors:	PUN Chi Seng
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Science::Mathematics
Online Access:	https://hdl.handle.net/10356/148520
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-148520
record_format	dspace
spelling	sg-ntu-dr.10356-1485202023-02-28T23:17:57Z Time-inconsistent objectives in reinforcement learning Su, Huangyuan PUN Chi Seng School of Physical and Mathematical Sciences Nixie Sapphira Lesmana cspun@ntu.edu.sg Science::Mathematics In Reinforcement Learning, one of the most intriguing and long-lasting problems is about how to assign credit to historical events efficiently and meaningfully. And within temporal credit assignment problems, time inconsistency is a challenging sub-domain that was noticed long ago but still lacks systematic treatment. The goal of this work is to search for efficient algorithms that converge to equilibrium policies in the presence of time-inconsistent objectives. In this work, we first provide a brief introduction on reinforcement learning and control theory; then, we define the time-inconsistent problem, both illustratively and formally. After that, we propose a general backward update framework based on game theory. This framework is theoretically proven to be able to find the equilibrium control under time-inconsistency. We also review and implement a forward update algorithm that is able to find the equilibrium control in the special case of hyperbolic discounting but has many limitations. The literature review introduces other time-inconsistent situations and algorithms that deal with the efficient temporal credit assignment problem. Finally, we conclude the report and point out the future directions. Bachelor of Science in Mathematical Sciences 2021-05-05T08:34:19Z 2021-05-05T08:34:19Z 2021 Final Year Project (FYP) Su, H. (2021). Time-inconsistent objectives in reinforcement learning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/148520 https://hdl.handle.net/10356/148520 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Science::Mathematics
spellingShingle	Science::Mathematics Su, Huangyuan Time-inconsistent objectives in reinforcement learning
description	In Reinforcement Learning, one of the most intriguing and long-lasting problems is about how to assign credit to historical events efficiently and meaningfully. And within temporal credit assignment problems, time inconsistency is a challenging sub-domain that was noticed long ago but still lacks systematic treatment. The goal of this work is to search for efficient algorithms that converge to equilibrium policies in the presence of time-inconsistent objectives. In this work, we first provide a brief introduction on reinforcement learning and control theory; then, we define the time-inconsistent problem, both illustratively and formally. After that, we propose a general backward update framework based on game theory. This framework is theoretically proven to be able to find the equilibrium control under time-inconsistency. We also review and implement a forward update algorithm that is able to find the equilibrium control in the special case of hyperbolic discounting but has many limitations. The literature review introduces other time-inconsistent situations and algorithms that deal with the efficient temporal credit assignment problem. Finally, we conclude the report and point out the future directions.
author2	PUN Chi Seng
author_facet	PUN Chi Seng Su, Huangyuan
format	Final Year Project
author	Su, Huangyuan
author_sort	Su, Huangyuan
title	Time-inconsistent objectives in reinforcement learning
title_short	Time-inconsistent objectives in reinforcement learning
title_full	Time-inconsistent objectives in reinforcement learning
title_fullStr	Time-inconsistent objectives in reinforcement learning
title_full_unstemmed	Time-inconsistent objectives in reinforcement learning
title_sort	time-inconsistent objectives in reinforcement learning
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/148520
_version_	1759857446708314112

Time-inconsistent objectives in reinforcement learning

Similar Items