Difference of convex functions programming for policy optimization in reinforcement learning
We formulate the problem of optimizing an agent's policy within the Markov decision process (MDP) model as a difference-of-convex functions (DC) program. The DC perspective enables optimizing the policy iteratively where each iteration constructs an easier-to-optimize lower bound on the value f...
محفوظ في:
المؤلف الرئيسي: | KUMAR, Akshat |
---|---|
التنسيق: | text |
اللغة: | English |
منشور في: |
Institutional Knowledge at Singapore Management University
2024
|
الموضوعات: | |
الوصول للمادة أونلاين: | https://ink.library.smu.edu.sg/sis_research/9926 https://ink.library.smu.edu.sg/context/sis_research/article/10926/viewcontent/ConvexFunctionsProg_pvoa_cc_by.pdf |
الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
المؤسسة: | Singapore Management University |
اللغة: | English |
مواد مشابهة
-
Constrained reinforcement learning in hard exploration problems
بواسطة: PATHMANATHAN, Pankayaraj, وآخرون
منشور في: (2023) -
Integrating motivated learning and k-winner-take-all to coordinate multi-agent reinforcement learning
بواسطة: TENG, Teck-Hou, وآخرون
منشور في: (2014) -
Motivated learning as an extension of reinforcement learning
بواسطة: STARZYK, Janusz, وآخرون
منشور في: (2010) -
Reinforcement learning for zone based multiagent pathfinding under uncertainty
بواسطة: LING, Jiajing, وآخرون
منشور في: (2020) -
Constrained multiagent reinforcement learning for large agent population
بواسطة: LING, Jiajing, وآخرون
منشور في: (2022)