Reinforcement learning and dynamic motion primitives
Multi-agent algorithms in Reinforcement Learning are a close approximation of real-world scenarios where there is a complex interplay between competition and collaboration between agents existing in an unpredictable environment. MultiAgent POsthumous Credit Assignment (MA-POCA) is a novel algorithm...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/150858 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-150858 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1508582021-06-03T06:21:44Z Reinforcement learning and dynamic motion primitives Mudgal, Saurabh Domenico Campolo School of Mechanical and Aerospace Engineering d.campolo@ntu.edu.sg Engineering::Mechanical engineering Multi-agent algorithms in Reinforcement Learning are a close approximation of real-world scenarios where there is a complex interplay between competition and collaboration between agents existing in an unpredictable environment. MultiAgent POsthumous Credit Assignment (MA-POCA) is a novel algorithm by Unity that has the potential to adapt the theories of multi-agent Reinforcement Learning to industrial applications. In this thesis, we study the theory of underlying concepts and literature of Reinforcement Learning that lead to such a sophisticated algorithm. Following that, we run evaluative experiments implementing the MA-POCA algorithm in simulated multi-agent environments. We discover that MA-POCA uses a fixed ratio parameter to balance collaborative and competitive self-play. This introduces problems similar to that seen in a Trust Region Policy Optimization (TRPO) and can be fixed using concepts of Proximal Policy Gradient (PPO). Further work is suggested to benchmark performance improvements from such modifications. Bachelor of Engineering (Mechanical Engineering) 2021-06-03T06:21:44Z 2021-06-03T06:21:44Z 2021 Final Year Project (FYP) Mudgal, S. (2021). Reinforcement learning and dynamic motion primitives. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/150858 https://hdl.handle.net/10356/150858 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Mechanical engineering |
spellingShingle |
Engineering::Mechanical engineering Mudgal, Saurabh Reinforcement learning and dynamic motion primitives |
description |
Multi-agent algorithms in Reinforcement Learning are a close approximation of real-world scenarios where there is a complex interplay between competition and collaboration between agents existing in an unpredictable environment. MultiAgent POsthumous Credit Assignment (MA-POCA) is a novel algorithm by Unity that has the potential to adapt the theories of multi-agent Reinforcement Learning to industrial applications. In this thesis, we study the theory of underlying concepts and literature of Reinforcement Learning that lead to such a sophisticated algorithm. Following that, we run evaluative experiments implementing the MA-POCA algorithm in simulated multi-agent environments. We discover that MA-POCA uses a fixed ratio parameter to balance collaborative and competitive self-play. This introduces problems similar to that seen in a Trust Region Policy Optimization (TRPO) and can be fixed using concepts of Proximal Policy Gradient (PPO). Further work is suggested to benchmark performance improvements from such modifications. |
author2 |
Domenico Campolo |
author_facet |
Domenico Campolo Mudgal, Saurabh |
format |
Final Year Project |
author |
Mudgal, Saurabh |
author_sort |
Mudgal, Saurabh |
title |
Reinforcement learning and dynamic motion primitives |
title_short |
Reinforcement learning and dynamic motion primitives |
title_full |
Reinforcement learning and dynamic motion primitives |
title_fullStr |
Reinforcement learning and dynamic motion primitives |
title_full_unstemmed |
Reinforcement learning and dynamic motion primitives |
title_sort |
reinforcement learning and dynamic motion primitives |
publisher |
Nanyang Technological University |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/150858 |
_version_ |
1702431197234200576 |