#TITLE_ALTERNATIVE#

The study of machine learning, including reinforcement learning, is currently developing. The use of Monte Carlo simulation is one way that can be used to solve reinforcement learning problems by generating random episodes. Although it is well known, its implementation is still lacking compared to o...

Full description

Saved in:
Bibliographic Details
Main Author: NUR KARIMAH (NIM: 13514106 ), HASNA
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/27773
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:The study of machine learning, including reinforcement learning, is currently developing. The use of Monte Carlo simulation is one way that can be used to solve reinforcement learning problems by generating random episodes. Although it is well known, its implementation is still lacking compared to other methods such as Dynamic Programming and Temporal-Difference Learning. In fact, the use of this Monte Carlo simulation has advantages over the two methods, including learning is done by using a real sample, so there is no bias. In this final project, Monte Carlo simulation is used for finding a solution to reinforcement learning problem, which is pathfinding case in a gridworld environment. Then, a study was carried out on the influence of learning parameters and seeds of random number generator to the solution earned. From the experimental results, some main points were obtained. First, the random number generator seed does not have a significant influence on the solution. Second, the learning parameters have an influence on the results, including the number of episodes and the number of steps that give better results, but the training time becomes longer. Then, there is also the value of &#949;, which is the probability of random action, which affects the results. The greater the value of &#949;, the more likely random action is chosen than the best action, the longer the training time is needed, but the greater the goal percentage achieved. <br />