IMPLEMENTATION OF DEEP REINFORCEMENT LEARNING IN SOCCER SIMULATION 2D GAMES
Reinforcement learning is one of the sub problems of machine learning where agents learn how to do the best action in a condition in an environment. Deep learning is able to help reinforcement learning in representing large state space. By using deep reinforcement learning agents can play in their e...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/40100 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:40100 |
---|---|
spelling |
id-itb.:401002019-07-01T08:29:13ZIMPLEMENTATION OF DEEP REINFORCEMENT LEARNING IN SOCCER SIMULATION 2D GAMES Adi Kuncoro, Azis Indonesia Final Project reinforcement learning, deep learning, multi agent, soccer simulation. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/40100 Reinforcement learning is one of the sub problems of machine learning where agents learn how to do the best action in a condition in an environment. Deep learning is able to help reinforcement learning in representing large state space. By using deep reinforcement learning agents can play in their environment without prior knowledge. Soccer simulation 2D game is a game environment that simulates soccer games. One development of soccer simulation 2D is Half Field Offense (HFO). HFO provides features that help in learning reinforcement learning such as episodic learning, the choice to use high level or low level action or state space, the availability of hand-coded agents and random agents as baselines, available in python and C ++. In this final project, an advantage actor critic (A2C) method is used. In its implementation, A2C has two deep neural networks, namely network actors and network critics. Network actors are tasked with selecting actions for agents. The network actor receives input in the form of a state from the HFO game on a timestep and the output is a code of a discrete action. While the network critic is in charge of assessing how well the action produced is based on its state. Network critic receives input in the form of state and action chosen by the agent and the output is in the form of evaluation value from taking action in that state. There are two types of agents trained, namely attack agents and defending agents. The game scenario chosen is 5 vs 5, this is based on a futsal game that uses that many players. For each agent there is a separate A2C model. The strategy of coordination between agents is studied by agents during the learning phase. Agent learning takes 10,000 epochs against hand-coded agents. The results obtained are that A2C is able to surpass the baseline in the form of a random agent. However, it is still slightly below the performance of hand-coded agents. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Reinforcement learning is one of the sub problems of machine learning where agents learn how to do the best action in a condition in an environment. Deep learning is able to help reinforcement learning in representing large state space. By using deep reinforcement learning agents can play in their environment without prior knowledge.
Soccer simulation 2D game is a game environment that simulates soccer games. One development of soccer simulation 2D is Half Field Offense (HFO). HFO provides features that help in learning reinforcement learning such as episodic learning, the choice to use high level or low level action or state space, the availability of hand-coded agents and random agents as baselines, available in python and C ++.
In this final project, an advantage actor critic (A2C) method is used. In its implementation, A2C has two deep neural networks, namely network actors and network critics. Network actors are tasked with selecting actions for agents. The network actor receives input in the form of a state from the HFO game on a timestep and the output is a code of a discrete action. While the network critic is in charge of assessing how well the action produced is based on its state. Network critic receives input in the form of state and action chosen by the agent and the output is in the form of evaluation value from taking action in that state.
There are two types of agents trained, namely attack agents and defending agents. The game scenario chosen is 5 vs 5, this is based on a futsal game that uses that many players. For each agent there is a separate A2C model. The strategy of coordination between agents is studied by agents during the learning phase. Agent learning takes 10,000 epochs against hand-coded agents. The results obtained are that A2C is able to surpass the baseline in the form of a random agent. However, it is still slightly below the performance of hand-coded agents. |
format |
Final Project |
author |
Adi Kuncoro, Azis |
spellingShingle |
Adi Kuncoro, Azis IMPLEMENTATION OF DEEP REINFORCEMENT LEARNING IN SOCCER SIMULATION 2D GAMES |
author_facet |
Adi Kuncoro, Azis |
author_sort |
Adi Kuncoro, Azis |
title |
IMPLEMENTATION OF DEEP REINFORCEMENT LEARNING IN SOCCER SIMULATION 2D GAMES |
title_short |
IMPLEMENTATION OF DEEP REINFORCEMENT LEARNING IN SOCCER SIMULATION 2D GAMES |
title_full |
IMPLEMENTATION OF DEEP REINFORCEMENT LEARNING IN SOCCER SIMULATION 2D GAMES |
title_fullStr |
IMPLEMENTATION OF DEEP REINFORCEMENT LEARNING IN SOCCER SIMULATION 2D GAMES |
title_full_unstemmed |
IMPLEMENTATION OF DEEP REINFORCEMENT LEARNING IN SOCCER SIMULATION 2D GAMES |
title_sort |
implementation of deep reinforcement learning in soccer simulation 2d games |
url |
https://digilib.itb.ac.id/gdl/view/40100 |
_version_ |
1822925634101313536 |