Difference of convex functions programming for policy optimization in reinforcement learning
We formulate the problem of optimizing an agent's policy within the Markov decision process (MDP) model as a difference-of-convex functions (DC) program. The DC perspective enables optimizing the policy iteratively where each iteration constructs an easier-to-optimize lower bound on the value f...
Saved in:
Main Author: | KUMAR, Akshat |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9926 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Similar Items
-
Constrained reinforcement learning in hard exploration problems
by: PATHMANATHAN, Pankayaraj, et al.
Published: (2023) -
Integrating motivated learning and k-winner-take-all to coordinate multi-agent reinforcement learning
by: TENG, Teck-Hou, et al.
Published: (2014) -
Motivated learning as an extension of reinforcement learning
by: STARZYK, Janusz, et al.
Published: (2010) -
Reinforcement learning for zone based multiagent pathfinding under uncertainty
by: LING, Jiajing, et al.
Published: (2020) -
Constrained multiagent reinforcement learning for large agent population
by: LING, Jiajing, et al.
Published: (2022)