Reinforcement learning and dynamic motion primitives
Multi-agent algorithms in Reinforcement Learning are a close approximation of real-world scenarios where there is a complex interplay between competition and collaboration between agents existing in an unpredictable environment. MultiAgent POsthumous Credit Assignment (MA-POCA) is a novel algorithm...
Saved in:
主要作者: | |
---|---|
其他作者: | |
格式: | Final Year Project |
語言: | English |
出版: |
Nanyang Technological University
2021
|
主題: | |
在線閱讀: | https://hdl.handle.net/10356/150858 |
標簽: |
添加標簽
沒有標簽, 成為第一個標記此記錄!
|
總結: | Multi-agent algorithms in Reinforcement Learning are a close approximation of real-world scenarios where there is a complex interplay between competition and collaboration between agents existing in an unpredictable environment. MultiAgent POsthumous Credit Assignment (MA-POCA) is a novel algorithm by Unity that has the potential to adapt the theories of multi-agent Reinforcement Learning to industrial applications. In this thesis, we study the theory of underlying concepts and literature of Reinforcement Learning that lead to such a sophisticated algorithm. Following that, we run evaluative experiments implementing the MA-POCA algorithm in simulated multi-agent environments. We discover that MA-POCA uses a fixed ratio parameter to balance collaborative and competitive self-play. This introduces problems similar to that seen in a Trust Region Policy Optimization (TRPO) and can be fixed using concepts of Proximal Policy Gradient (PPO). Further work is suggested to benchmark performance improvements from such modifications. |
---|