Learning-based autonomous navigation for mobile robot in unknown environments

As an essential capability for mobile robots, autonomous navigation has been extensively studied for decades. However, it remains challenging due to the inherent complexity and diversity of the problem, especially when the robots are deployed in unknown environments. The traditional methods highly r...

Full description

Saved in:
Bibliographic Details
Main Author: Wen, Mingxing
Other Authors: Wang Dan Wei
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/166641
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:As an essential capability for mobile robots, autonomous navigation has been extensively studied for decades. However, it remains challenging due to the inherent complexity and diversity of the problem, especially when the robots are deployed in unknown environments. The traditional methods highly rely on maintaining a global consistent map to re-localize, plan and control the mobile robots, which significantly increase the computation complexity. Besides, they often include a quantity of manually designed and tuned parameters, leading to poor generalizability. To overcome the bottlenecks of traditional methods, learning-based approaches have gained much popularity in the past years. Most of the existing learning-based approaches to autonomous navigation in unknown environments are subject to certain challenges such as poor interpretability (End-to-End manner), high dependence on large labelled data set, lack of robust and affordable perception capability, and sample inefficiency. This thesis has investigated several machine learning methods to address the problems mentioned above, and proposes a novel learning framework for learning autonomous navigation efficiently in unknown environments. First of all, an imitation learning based autonomous navigation in unknown environments has been proposed and evaluated in real world sidewalk navigation task. Autonomous navigation along a sidewalk is an important capability for some service robots, such as last mile delivery robots. The current solutions for sidewalk navigation heavily rely on the image-based perception and GPS localization to navigate along the sidewalk successfully. However, it is prone to fail and become unreliable when the robot runs in challenging conditions, such as operating in different illuminations, or under canopies of trees or buildings. To address these issues, a novel robust sidewalk navigation method for the last-mile delivery robots with an affordable sparse LiDAR is proposed, which consists of two main modules: Semantic Point Cloud Network (SegPCn) and Reactive Navigation Network (RNn). More specifically, SegPCn takes the raw 3D point cloud as input and predicts the point-wise segmentation labels, presenting a robust perception capability even in the night. Then, the semantic point clouds are fed to RNn to generate an angular velocity to navigate the robot along the sidewalk, where the localization of the robot is not required. By decomposing the sidewalk navigation into perception and reactive navigation sub-modules, the interpretability of the whole system has also been improved. Moreover, an autolabeling mechanism is developed to reduce the labor involved in data preparation as well. And the LSTM neural network is explored to effectively leverage the historical context and derive correct decisions. Extensive experiments have been carried out to verify the efficacy of this method, and the results show that this method enables the robot to navigate on the sidewalk robustly during day and night. Secondly, a reinforcement learning based target-driven autonomous navigation in unknown environments has been developed and verified in Carla self-driving car simulation. By introducing reinforcement learning, the dependence on a large labelled data set is released. For autonomous navigation of autonomous vehicle, the current solutions mainly rely on either human pre-defined rules or a precise high-resolution map, which are not feasible for the unknown environments, especially when there are some extreme situations not described in the driving rules. In that case, a new reinforcement learning based method is proposed to conquer these issues. First, a pre-trained VAE (Variational AutoEncoder) is used to extract representative features from road images, and PPO (Proximal Policy Optimization) algorithm is implemented to learn target-driven navigation for the autonomous vehicle to eliminate the dependence on the map and predefined rules. Moreover, in order to improve the learning efficiency, human driver experience is introduced and how to effectively incorporate human driver experience into reinforcement learning is also investigated. To verify the performance, this algorithm is implemented and deployed in CARLA simulation environments and extensive experiments are conducted to select the effective strategy of reusing driving experience. The results prove that our algorithm can successfully navigate the vehicle in the urban environment without map or any predefined rules. And by integrating human driver experience, the learning efficiency is dramatically improved, especially when using Ratio strategy to reuse the human driver experience. Finally, a novel learning framework, Couple Imitation and Reinforcement Learning (CIRL), for learning autonomous navigaiton efficiently in unknown environments has been proposed and evaluated in both simulation and real word experiments. This learning framework takes advantages of imitation learning and reinforcement learning to address poor generalizability, sample inefficiency problems. CIRL consists of two key stages: Bootstrapping IMitation learning (BIM) and Near-Optimal Policy (NOP). Specifically, as a new imitation learning method, BIM only uses one single demonstrated navigation trajectory as input to pre-train the neural networks of both actor and critic simultaneously. In the NOP stage, the demonstrated trajectory and pre-trained model are then taken as a benchmark to filter the experience generated by pure reinforcement learning to maintain the quality of experience in replay buffer. Following the hypothesis that the quality of experience in reply buffer is superior to the diversity, the CIRL framework can achieve greatly improved sample efficiency and training safety since it makes better use of demonstration data to guide how to keep high quality of experience for training. In order to verify and evaluate the performance of the CIRL framework, simulation and real-world mobile robot navigation experiments are carried out. The results show that by leveraging CIRL, the sample efficiency (success rate) increases from 24.4% to 72.4%, and training safety (collision rate) improves from 68.5% to 24.5%. And we also prove that by successfully applying in the two state-of-the-art reinforcement learning algorithms, namely Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3), CIRL is capable to be easily integrated into any Actor-Critic reinforcement learning algorithms.