Human-guided reinforcement learning: methodology and application to autonomous driving
The thriving artificial intelligence (AI) technologies have been used to address various challenges in the physical world. Currently, AI methods are widely used in perception, decision-making, and control in many autonomous systems. As a typical application of AI in real world, Autonomous vehicles (...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/169780 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-169780 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Mechanical engineering::Motor vehicles |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Mechanical engineering::Motor vehicles Wu, Jingda Human-guided reinforcement learning: methodology and application to autonomous driving |
description |
The thriving artificial intelligence (AI) technologies have been used to address various challenges in the physical world. Currently, AI methods are widely used in perception, decision-making, and control in many autonomous systems. As a typical application of AI in real world, Autonomous vehicles (AVs) have a great potential to enhance the safety, smartness, and sustainability for mobility systems. Although AI technology has achieved remarkable successes in information extraction for perception systems of AVs, AI-enabled reasoning and decision-making capabilities, for example the reinforcement learning (RL) methods, still exhibit limitations in AVs' behavioral planning. RL recognizes the controlled environment using an exploratory trial-and-error mechanism and optimizes its policy independently, yet its difficulty in incorporating prior knowledge makes it susceptible to local optima in complex or sparse-feedback problems and leads to poor learning outcomes. Since human beings who have prior knowledge and reasoning abilities can handle complex tasks like driving, it is promising to fuse human intelligence with RL for achieving more advanced AVs.
In this thesis, we develop a novel human-guided RL framework to achieve the behavioral planning function for AVs. We propose a series of methods to improve RL performance from different aspects of the algorithm principle by introducing human guidance to the RL's exploratory learning process.
We start by incorporating pre-collected human demonstrations as high-quality data into the RL's exploratory data set, which increases the efficiency of data acquisition by leveraging human prior knowledge. The proposed human demonstration-aided RL method is instantiated in a value-based RL algorithm and used to address the tactical decision-making of AVs in multi-lane highways. This method is shown to be advantageous over conventional RL-based approaches.
Then, to improve the efficiency of data utilization, we propose novel RL algorithms that additionally mimic human behavior to allow more efficient and robust learning from human guidance. For RL algorithms without explicit policy functions, we develop a new objective term which is added to the RL's value function to confer additional value on those human-demonstrated experiences. This objective encourages RL to follow human driving behaviors, which narrows down exploration space and accelerates learning. Using a discrete-action RL algorithm as a backbone, the proposed method enables AVs to execute tactical decisions in complex off-ramp scenarios.
For RL algorithms with explicit policy functions, we develop a similar objective term added to the policy function that could encourage RL to learn human behavior more directly. Furthermore, we propose an innovative advantage-based weighting mechanism for this objective, which enables variable learning extent from human guidance based on the advantage of human actions over RL actions. As a result, our method is more robust to fluctuations in human performance than existing baselines. We implement this method in a continuous-action RL algorithm for achieving end-to-end planning of AVs and validate it with human-in-the-loop experiments.
The proposed human-guided framework is further advanced by improving the utilization efficiency of scarce human guidance data. A human prioritized experience replay mechanism is proposed to prioritize human guidance over other data. Furthermore, a human intervention-based reward-shaping is proposed as a means of penalizing the RL's unfavorable action and reducing consistent reliance on human guidance. Accordingly, the human-guided RL has been shown to perform better in a wide range of behavior planning problems for AVs than state-of-the-art baselines.
Finally, all the ingredients above are incorporated to form a complete human-guided RL framework. We use it to address the goal-conditioned navigation problems of unmanned ground vehicles (UGVs). By developing sim-to-real techniques, it is validated in simulation and then applied to real-world UGVs, and its superiority over existing model-based and learning-based methods is evaluated.
Results suggest that the proposed human-guided RL framework can remarkably improve the learning efficiency and performance of RLs. A series of simulations and real-world experiments have shown that our method is superior to vanilla RLs, imitation learning, existing human-guided RLs, and other conventional model-based and learning-based methods in addressing behavioral planning problems of AVs. Through human-in-the-loop experiments, our methodology has also been validated to reduce human workload and proficiency. The proposed methodology in this thesis can contribute to the development of learning-based autonomous driving techniques and has the potential to be applied in a broader range of contexts. |
author2 |
Lyu Chen |
author_facet |
Lyu Chen Wu, Jingda |
format |
Thesis-Doctor of Philosophy |
author |
Wu, Jingda |
author_sort |
Wu, Jingda |
title |
Human-guided reinforcement learning: methodology and application to autonomous driving |
title_short |
Human-guided reinforcement learning: methodology and application to autonomous driving |
title_full |
Human-guided reinforcement learning: methodology and application to autonomous driving |
title_fullStr |
Human-guided reinforcement learning: methodology and application to autonomous driving |
title_full_unstemmed |
Human-guided reinforcement learning: methodology and application to autonomous driving |
title_sort |
human-guided reinforcement learning: methodology and application to autonomous driving |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/169780 |
_version_ |
1779156647870988288 |
spelling |
sg-ntu-dr.10356-1697802023-09-04T07:32:08Z Human-guided reinforcement learning: methodology and application to autonomous driving Wu, Jingda Lyu Chen School of Mechanical and Aerospace Engineering lyuchen@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Mechanical engineering::Motor vehicles The thriving artificial intelligence (AI) technologies have been used to address various challenges in the physical world. Currently, AI methods are widely used in perception, decision-making, and control in many autonomous systems. As a typical application of AI in real world, Autonomous vehicles (AVs) have a great potential to enhance the safety, smartness, and sustainability for mobility systems. Although AI technology has achieved remarkable successes in information extraction for perception systems of AVs, AI-enabled reasoning and decision-making capabilities, for example the reinforcement learning (RL) methods, still exhibit limitations in AVs' behavioral planning. RL recognizes the controlled environment using an exploratory trial-and-error mechanism and optimizes its policy independently, yet its difficulty in incorporating prior knowledge makes it susceptible to local optima in complex or sparse-feedback problems and leads to poor learning outcomes. Since human beings who have prior knowledge and reasoning abilities can handle complex tasks like driving, it is promising to fuse human intelligence with RL for achieving more advanced AVs. In this thesis, we develop a novel human-guided RL framework to achieve the behavioral planning function for AVs. We propose a series of methods to improve RL performance from different aspects of the algorithm principle by introducing human guidance to the RL's exploratory learning process. We start by incorporating pre-collected human demonstrations as high-quality data into the RL's exploratory data set, which increases the efficiency of data acquisition by leveraging human prior knowledge. The proposed human demonstration-aided RL method is instantiated in a value-based RL algorithm and used to address the tactical decision-making of AVs in multi-lane highways. This method is shown to be advantageous over conventional RL-based approaches. Then, to improve the efficiency of data utilization, we propose novel RL algorithms that additionally mimic human behavior to allow more efficient and robust learning from human guidance. For RL algorithms without explicit policy functions, we develop a new objective term which is added to the RL's value function to confer additional value on those human-demonstrated experiences. This objective encourages RL to follow human driving behaviors, which narrows down exploration space and accelerates learning. Using a discrete-action RL algorithm as a backbone, the proposed method enables AVs to execute tactical decisions in complex off-ramp scenarios. For RL algorithms with explicit policy functions, we develop a similar objective term added to the policy function that could encourage RL to learn human behavior more directly. Furthermore, we propose an innovative advantage-based weighting mechanism for this objective, which enables variable learning extent from human guidance based on the advantage of human actions over RL actions. As a result, our method is more robust to fluctuations in human performance than existing baselines. We implement this method in a continuous-action RL algorithm for achieving end-to-end planning of AVs and validate it with human-in-the-loop experiments. The proposed human-guided framework is further advanced by improving the utilization efficiency of scarce human guidance data. A human prioritized experience replay mechanism is proposed to prioritize human guidance over other data. Furthermore, a human intervention-based reward-shaping is proposed as a means of penalizing the RL's unfavorable action and reducing consistent reliance on human guidance. Accordingly, the human-guided RL has been shown to perform better in a wide range of behavior planning problems for AVs than state-of-the-art baselines. Finally, all the ingredients above are incorporated to form a complete human-guided RL framework. We use it to address the goal-conditioned navigation problems of unmanned ground vehicles (UGVs). By developing sim-to-real techniques, it is validated in simulation and then applied to real-world UGVs, and its superiority over existing model-based and learning-based methods is evaluated. Results suggest that the proposed human-guided RL framework can remarkably improve the learning efficiency and performance of RLs. A series of simulations and real-world experiments have shown that our method is superior to vanilla RLs, imitation learning, existing human-guided RLs, and other conventional model-based and learning-based methods in addressing behavioral planning problems of AVs. Through human-in-the-loop experiments, our methodology has also been validated to reduce human workload and proficiency. The proposed methodology in this thesis can contribute to the development of learning-based autonomous driving techniques and has the potential to be applied in a broader range of contexts. Doctor of Philosophy 2023-08-03T02:47:40Z 2023-08-03T02:47:40Z 2023 Thesis-Doctor of Philosophy Wu, J. (2023). Human-guided reinforcement learning: methodology and application to autonomous driving. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/169780 https://hdl.handle.net/10356/169780 10.32657/10356/169780 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |