Reinforcement learning and dynamic motion primitives

Multi-agent algorithms in Reinforcement Learning are a close approximation of real-world scenarios where there is a complex interplay between competition and collaboration between agents existing in an unpredictable environment. MultiAgent POsthumous Credit Assignment (MA-POCA) is a novel algorithm...

Full description

Saved in:
Bibliographic Details
Main Author: Mudgal, Saurabh
Other Authors: Domenico Campolo
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/150858
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-150858
record_format dspace
spelling sg-ntu-dr.10356-1508582021-06-03T06:21:44Z Reinforcement learning and dynamic motion primitives Mudgal, Saurabh Domenico Campolo School of Mechanical and Aerospace Engineering d.campolo@ntu.edu.sg Engineering::Mechanical engineering Multi-agent algorithms in Reinforcement Learning are a close approximation of real-world scenarios where there is a complex interplay between competition and collaboration between agents existing in an unpredictable environment. MultiAgent POsthumous Credit Assignment (MA-POCA) is a novel algorithm by Unity that has the potential to adapt the theories of multi-agent Reinforcement Learning to industrial applications. In this thesis, we study the theory of underlying concepts and literature of Reinforcement Learning that lead to such a sophisticated algorithm. Following that, we run evaluative experiments implementing the MA-POCA algorithm in simulated multi-agent environments. We discover that MA-POCA uses a fixed ratio parameter to balance collaborative and competitive self-play. This introduces problems similar to that seen in a Trust Region Policy Optimization (TRPO) and can be fixed using concepts of Proximal Policy Gradient (PPO). Further work is suggested to benchmark performance improvements from such modifications. Bachelor of Engineering (Mechanical Engineering) 2021-06-03T06:21:44Z 2021-06-03T06:21:44Z 2021 Final Year Project (FYP) Mudgal, S. (2021). Reinforcement learning and dynamic motion primitives. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/150858 https://hdl.handle.net/10356/150858 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Mechanical engineering
spellingShingle Engineering::Mechanical engineering
Mudgal, Saurabh
Reinforcement learning and dynamic motion primitives
description Multi-agent algorithms in Reinforcement Learning are a close approximation of real-world scenarios where there is a complex interplay between competition and collaboration between agents existing in an unpredictable environment. MultiAgent POsthumous Credit Assignment (MA-POCA) is a novel algorithm by Unity that has the potential to adapt the theories of multi-agent Reinforcement Learning to industrial applications. In this thesis, we study the theory of underlying concepts and literature of Reinforcement Learning that lead to such a sophisticated algorithm. Following that, we run evaluative experiments implementing the MA-POCA algorithm in simulated multi-agent environments. We discover that MA-POCA uses a fixed ratio parameter to balance collaborative and competitive self-play. This introduces problems similar to that seen in a Trust Region Policy Optimization (TRPO) and can be fixed using concepts of Proximal Policy Gradient (PPO). Further work is suggested to benchmark performance improvements from such modifications.
author2 Domenico Campolo
author_facet Domenico Campolo
Mudgal, Saurabh
format Final Year Project
author Mudgal, Saurabh
author_sort Mudgal, Saurabh
title Reinforcement learning and dynamic motion primitives
title_short Reinforcement learning and dynamic motion primitives
title_full Reinforcement learning and dynamic motion primitives
title_fullStr Reinforcement learning and dynamic motion primitives
title_full_unstemmed Reinforcement learning and dynamic motion primitives
title_sort reinforcement learning and dynamic motion primitives
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/150858
_version_ 1702431197234200576