Human-guided reinforcement learning: methodology and application to autonomous driving

The thriving artificial intelligence (AI) technologies have been used to address various challenges in the physical world. Currently, AI methods are widely used in perception, decision-making, and control in many autonomous systems. As a typical application of AI in real world, Autonomous vehicles (...

Full description

Saved in:
Bibliographic Details
Main Author: Wu, Jingda
Other Authors: Lyu Chen
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/169780
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-169780
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Mechanical engineering::Motor vehicles
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Mechanical engineering::Motor vehicles
Wu, Jingda
Human-guided reinforcement learning: methodology and application to autonomous driving
description The thriving artificial intelligence (AI) technologies have been used to address various challenges in the physical world. Currently, AI methods are widely used in perception, decision-making, and control in many autonomous systems. As a typical application of AI in real world, Autonomous vehicles (AVs) have a great potential to enhance the safety, smartness, and sustainability for mobility systems. Although AI technology has achieved remarkable successes in information extraction for perception systems of AVs, AI-enabled reasoning and decision-making capabilities, for example the reinforcement learning (RL) methods, still exhibit limitations in AVs' behavioral planning. RL recognizes the controlled environment using an exploratory trial-and-error mechanism and optimizes its policy independently, yet its difficulty in incorporating prior knowledge makes it susceptible to local optima in complex or sparse-feedback problems and leads to poor learning outcomes. Since human beings who have prior knowledge and reasoning abilities can handle complex tasks like driving, it is promising to fuse human intelligence with RL for achieving more advanced AVs. In this thesis, we develop a novel human-guided RL framework to achieve the behavioral planning function for AVs. We propose a series of methods to improve RL performance from different aspects of the algorithm principle by introducing human guidance to the RL's exploratory learning process. We start by incorporating pre-collected human demonstrations as high-quality data into the RL's exploratory data set, which increases the efficiency of data acquisition by leveraging human prior knowledge. The proposed human demonstration-aided RL method is instantiated in a value-based RL algorithm and used to address the tactical decision-making of AVs in multi-lane highways. This method is shown to be advantageous over conventional RL-based approaches. Then, to improve the efficiency of data utilization, we propose novel RL algorithms that additionally mimic human behavior to allow more efficient and robust learning from human guidance. For RL algorithms without explicit policy functions, we develop a new objective term which is added to the RL's value function to confer additional value on those human-demonstrated experiences. This objective encourages RL to follow human driving behaviors, which narrows down exploration space and accelerates learning. Using a discrete-action RL algorithm as a backbone, the proposed method enables AVs to execute tactical decisions in complex off-ramp scenarios. For RL algorithms with explicit policy functions, we develop a similar objective term added to the policy function that could encourage RL to learn human behavior more directly. Furthermore, we propose an innovative advantage-based weighting mechanism for this objective, which enables variable learning extent from human guidance based on the advantage of human actions over RL actions. As a result, our method is more robust to fluctuations in human performance than existing baselines. We implement this method in a continuous-action RL algorithm for achieving end-to-end planning of AVs and validate it with human-in-the-loop experiments. The proposed human-guided framework is further advanced by improving the utilization efficiency of scarce human guidance data. A human prioritized experience replay mechanism is proposed to prioritize human guidance over other data. Furthermore, a human intervention-based reward-shaping is proposed as a means of penalizing the RL's unfavorable action and reducing consistent reliance on human guidance. Accordingly, the human-guided RL has been shown to perform better in a wide range of behavior planning problems for AVs than state-of-the-art baselines. Finally, all the ingredients above are incorporated to form a complete human-guided RL framework. We use it to address the goal-conditioned navigation problems of unmanned ground vehicles (UGVs). By developing sim-to-real techniques, it is validated in simulation and then applied to real-world UGVs, and its superiority over existing model-based and learning-based methods is evaluated. Results suggest that the proposed human-guided RL framework can remarkably improve the learning efficiency and performance of RLs. A series of simulations and real-world experiments have shown that our method is superior to vanilla RLs, imitation learning, existing human-guided RLs, and other conventional model-based and learning-based methods in addressing behavioral planning problems of AVs. Through human-in-the-loop experiments, our methodology has also been validated to reduce human workload and proficiency. The proposed methodology in this thesis can contribute to the development of learning-based autonomous driving techniques and has the potential to be applied in a broader range of contexts.
author2 Lyu Chen
author_facet Lyu Chen
Wu, Jingda
format Thesis-Doctor of Philosophy
author Wu, Jingda
author_sort Wu, Jingda
title Human-guided reinforcement learning: methodology and application to autonomous driving
title_short Human-guided reinforcement learning: methodology and application to autonomous driving
title_full Human-guided reinforcement learning: methodology and application to autonomous driving
title_fullStr Human-guided reinforcement learning: methodology and application to autonomous driving
title_full_unstemmed Human-guided reinforcement learning: methodology and application to autonomous driving
title_sort human-guided reinforcement learning: methodology and application to autonomous driving
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/169780
_version_ 1779156647870988288
spelling sg-ntu-dr.10356-1697802023-09-04T07:32:08Z Human-guided reinforcement learning: methodology and application to autonomous driving Wu, Jingda Lyu Chen School of Mechanical and Aerospace Engineering lyuchen@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Mechanical engineering::Motor vehicles The thriving artificial intelligence (AI) technologies have been used to address various challenges in the physical world. Currently, AI methods are widely used in perception, decision-making, and control in many autonomous systems. As a typical application of AI in real world, Autonomous vehicles (AVs) have a great potential to enhance the safety, smartness, and sustainability for mobility systems. Although AI technology has achieved remarkable successes in information extraction for perception systems of AVs, AI-enabled reasoning and decision-making capabilities, for example the reinforcement learning (RL) methods, still exhibit limitations in AVs' behavioral planning. RL recognizes the controlled environment using an exploratory trial-and-error mechanism and optimizes its policy independently, yet its difficulty in incorporating prior knowledge makes it susceptible to local optima in complex or sparse-feedback problems and leads to poor learning outcomes. Since human beings who have prior knowledge and reasoning abilities can handle complex tasks like driving, it is promising to fuse human intelligence with RL for achieving more advanced AVs. In this thesis, we develop a novel human-guided RL framework to achieve the behavioral planning function for AVs. We propose a series of methods to improve RL performance from different aspects of the algorithm principle by introducing human guidance to the RL's exploratory learning process. We start by incorporating pre-collected human demonstrations as high-quality data into the RL's exploratory data set, which increases the efficiency of data acquisition by leveraging human prior knowledge. The proposed human demonstration-aided RL method is instantiated in a value-based RL algorithm and used to address the tactical decision-making of AVs in multi-lane highways. This method is shown to be advantageous over conventional RL-based approaches. Then, to improve the efficiency of data utilization, we propose novel RL algorithms that additionally mimic human behavior to allow more efficient and robust learning from human guidance. For RL algorithms without explicit policy functions, we develop a new objective term which is added to the RL's value function to confer additional value on those human-demonstrated experiences. This objective encourages RL to follow human driving behaviors, which narrows down exploration space and accelerates learning. Using a discrete-action RL algorithm as a backbone, the proposed method enables AVs to execute tactical decisions in complex off-ramp scenarios. For RL algorithms with explicit policy functions, we develop a similar objective term added to the policy function that could encourage RL to learn human behavior more directly. Furthermore, we propose an innovative advantage-based weighting mechanism for this objective, which enables variable learning extent from human guidance based on the advantage of human actions over RL actions. As a result, our method is more robust to fluctuations in human performance than existing baselines. We implement this method in a continuous-action RL algorithm for achieving end-to-end planning of AVs and validate it with human-in-the-loop experiments. The proposed human-guided framework is further advanced by improving the utilization efficiency of scarce human guidance data. A human prioritized experience replay mechanism is proposed to prioritize human guidance over other data. Furthermore, a human intervention-based reward-shaping is proposed as a means of penalizing the RL's unfavorable action and reducing consistent reliance on human guidance. Accordingly, the human-guided RL has been shown to perform better in a wide range of behavior planning problems for AVs than state-of-the-art baselines. Finally, all the ingredients above are incorporated to form a complete human-guided RL framework. We use it to address the goal-conditioned navigation problems of unmanned ground vehicles (UGVs). By developing sim-to-real techniques, it is validated in simulation and then applied to real-world UGVs, and its superiority over existing model-based and learning-based methods is evaluated. Results suggest that the proposed human-guided RL framework can remarkably improve the learning efficiency and performance of RLs. A series of simulations and real-world experiments have shown that our method is superior to vanilla RLs, imitation learning, existing human-guided RLs, and other conventional model-based and learning-based methods in addressing behavioral planning problems of AVs. Through human-in-the-loop experiments, our methodology has also been validated to reduce human workload and proficiency. The proposed methodology in this thesis can contribute to the development of learning-based autonomous driving techniques and has the potential to be applied in a broader range of contexts. Doctor of Philosophy 2023-08-03T02:47:40Z 2023-08-03T02:47:40Z 2023 Thesis-Doctor of Philosophy Wu, J. (2023). Human-guided reinforcement learning: methodology and application to autonomous driving. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/169780 https://hdl.handle.net/10356/169780 10.32657/10356/169780 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University