Towards coordinated multi-agent exploration problem via segmentation and reinforcement learning

Exploring an unknown environment by multiple autonomous robots is a major challenge in the robotics domain. The robot or agent needs to incrementally construct a model or a map representation of the environment while performing its domain tasks like surveillance, search and rescue tasks, and cleanin...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Zichen
Other Authors: Tan Ah Hwee
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137152
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Exploring an unknown environment by multiple autonomous robots is a major challenge in the robotics domain. The robot or agent needs to incrementally construct a model or a map representation of the environment while performing its domain tasks like surveillance, search and rescue tasks, and cleaning. What the robot should do or where it should go to visit next can only be determined after the map is constructed at least partially. The typical approach is by taking a frontier point which is located in the boundary between a known area and an unknown region as the target location to visit. This point is selected from other frontiers as revealed whenever the robots observe the environment. However, when multiple robots are involved, the task becomes more challenging as they have to explore the unknown environment as efficient and fast as possible while avoiding conflicts or interferences among the robots that can reduce the efficiency. Although coordinating a team of autonomous robots to explore an unknown environment can be done in an efficient way, partitioning the map of the environment into separate regions or segments as the targets allocated to the robots to visit is an efficient approach. The partitioning must be performed continually and incrementally. There is a trade-off that generating many small segments can provide more details of the environment, but may lose the representation of larger areas that are useful and relevant to the exploration task at hand. A Hierarchical Adaptive Clustering (HAC) segmentation of the indoor environment is introduced in this thesis that can strike a balance between fine-grained clustering and generalized segmentation during the exploration. With the HAC approach, an effective multi-agent task allocation approach is developed, wherein the partitioning and allocation processes can be performed continually and incrementally in real-time. Experimental results on HAC-based exploration method shows that it is comparable with other state-of-the-art approaches including Frontier-based allocation and Voronoi-based exploration. The model outperforms the others in terms of meaningful topological clusters and efficient exploration. However, non-learning based methods usually employ a fixed strategy to allocate the robots or agents to explore selected locations that sometimes can not handle the unpredictable and dynamic situations well. These methods can be effective in a single robot case, but assigning multiple robots to explore different locations is challenging since individual robots may interfere with others, making the overall tasks less efficient. A learning-based approach is proposed to solve those issues in this thesis. The algorithm is called CNN-based Multi-agent Proximal Policy Optimization (CMAPPO), which is for allocating multiple robots to explore different environments while over time improving their strategies to allocate the tasks more efficiently and flexibly. This algorithm combines CNN to process multi-channel visual inputs from the observed environment, curriculum learning for improving learning efficiency, and PPO algorithm for motivation based reinforcement learning. Based on the evaluation, the CMAPPO can learn a more efficient strategy for multiple robots (the robot is named agent in the rest of this thesis) to explore the environment than the conventional frontier-based method. This thesis introduces a novel indoor space segmentation-based exploration method which is based on topological clusters of an enclosed environment to perform multi-agent exploration. Considering the dynamic situations in the environment, this thesis further develops a new end-to-end deep reinforcement learning architecture for multi-agent exploration strategy by using Convolutional Neural Network (CNN) and Proximal Policy Optimization (PPO).