Reinforcement learning based algorithm design for mobile robot dynamic obstacle avoidance
A good path planning strategy is an essential part of the autonomous navigation system when moving in a new environment. In the real case, there are not only static obstacles but also dynamic obstacles in the surrounding environment. In the future, drone transportation may become the main method of...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/152376 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | A good path planning strategy is an essential part of the autonomous navigation system when moving in a new environment. In the real case, there are not only static obstacles but also dynamic obstacles in the surrounding environment. In the future, drone transportation may become the main method of logistics, which, commonly, multiple robots work together in a limited space. In some traditional path planning strategies, the routes of the robots are pre-defined since the environment is already known. In the contrast, however, in most cases, the robot cannot get global information about the environment but only can detect the surrounding environment. It is necessary for robots to learn the changes in the environment.
In this dissertation, a grid world environment with static and dynamic obstacles is set, while reinforcement learning and two kinds of deep Q networks are used to do the training. After the training, the agent will know how to avoid obstacles when going forward to the terminal. Firstly, the basic theory of Q learning and deep Q network is introduced. In Q learning, the position of the agent is taken as the states. And set the epsilon greedy algorithm as the exploration strategy and updating the reward dictionary every step since the environment is always changing. After that, double DQN and dueling DQN are used to improve the performance of it in dynamic environments. At last, the networks are improved by the PER policy, and their performances are compared. |
---|