Reinforcement learning and dynamic motion primitives

Multi-agent algorithms in Reinforcement Learning are a close approximation of real-world scenarios where there is a complex interplay between competition and collaboration between agents existing in an unpredictable environment. MultiAgent POsthumous Credit Assignment (MA-POCA) is a novel algorithm...

全面介紹

Saved in:
書目詳細資料
主要作者: Mudgal, Saurabh
其他作者: Domenico Campolo
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2021
主題:
在線閱讀:https://hdl.handle.net/10356/150858
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Multi-agent algorithms in Reinforcement Learning are a close approximation of real-world scenarios where there is a complex interplay between competition and collaboration between agents existing in an unpredictable environment. MultiAgent POsthumous Credit Assignment (MA-POCA) is a novel algorithm by Unity that has the potential to adapt the theories of multi-agent Reinforcement Learning to industrial applications. In this thesis, we study the theory of underlying concepts and literature of Reinforcement Learning that lead to such a sophisticated algorithm. Following that, we run evaluative experiments implementing the MA-POCA algorithm in simulated multi-agent environments. We discover that MA-POCA uses a fixed ratio parameter to balance collaborative and competitive self-play. This introduces problems similar to that seen in a Trust Region Policy Optimization (TRPO) and can be fixed using concepts of Proximal Policy Gradient (PPO). Further work is suggested to benchmark performance improvements from such modifications.