Personalized robotic control via constrained multi-objective reinforcement learning
Reinforcement learning is capable of providing state-of-art performance in end-to-end robotic control tasks. Nevertheless, many real-world control tasks necessitate the balancing of multiple conflicting objectives while simultaneously ensuring that the learned policies adhere to constraints. Additio...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/173290 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-173290 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1732902024-01-23T04:59:01Z Personalized robotic control via constrained multi-objective reinforcement learning He, Xiangkun Hu, Zhongxu Yang, Haohan Lv, Chen School of Mechanical and Aerospace Engineering Continental-NTU Corporate Lab Engineering::Mechanical engineering Reinforcement Learning Multi-Objective Optimization Reinforcement learning is capable of providing state-of-art performance in end-to-end robotic control tasks. Nevertheless, many real-world control tasks necessitate the balancing of multiple conflicting objectives while simultaneously ensuring that the learned policies adhere to constraints. Additionally, individual users may typically prefer to explore the personalized and diversified robotic control modes via specific preferences. Therefore, this paper presents a novel constrained multi-objective reinforcement learning algorithm for personalized end-to-end robotic control with continuous actions, allowing a trained single model to approximate the Pareto optimal policies for any user-specified preferences. The proposed approach is formulated as a constrained multi-objective Markov decision process, incorporating a nonlinear constraint design to facilitate the agent in learning optimal policies that align with specified user preferences across the entire preference space. Meanwhile, a comprehensive index based on hypervolume and entropy is presented to measure the convergence, diversity and evenness of the learned control policies. The proposed scheme is evaluated on nine multi-objective end-to-end robotic control tasks with continuous action space, and its effectiveness is demonstrated in comparison with the competitive baselines, including classical and state-of-the-art algorithms. Agency for Science, Technology and Research (A*STAR) This study is supported under the RIE2020 Industry Alignment Fund-Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). 2024-01-23T04:59:01Z 2024-01-23T04:59:01Z 2024 Journal Article He, X., Hu, Z., Yang, H. & Lv, C. (2024). Personalized robotic control via constrained multi-objective reinforcement learning. Neurocomputing, 565, 126986-. https://dx.doi.org/10.1016/j.neucom.2023.126986 0925-2312 https://hdl.handle.net/10356/173290 10.1016/j.neucom.2023.126986 2-s2.0-85175714885 565 126986 en Neurocomputing © 2023 Elsevier B.V. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Mechanical engineering Reinforcement Learning Multi-Objective Optimization |
spellingShingle |
Engineering::Mechanical engineering Reinforcement Learning Multi-Objective Optimization He, Xiangkun Hu, Zhongxu Yang, Haohan Lv, Chen Personalized robotic control via constrained multi-objective reinforcement learning |
description |
Reinforcement learning is capable of providing state-of-art performance in end-to-end robotic control tasks. Nevertheless, many real-world control tasks necessitate the balancing of multiple conflicting objectives while simultaneously ensuring that the learned policies adhere to constraints. Additionally, individual users may typically prefer to explore the personalized and diversified robotic control modes via specific preferences. Therefore, this paper presents a novel constrained multi-objective reinforcement learning algorithm for personalized end-to-end robotic control with continuous actions, allowing a trained single model to approximate the Pareto optimal policies for any user-specified preferences. The proposed approach is formulated as a constrained multi-objective Markov decision process, incorporating a nonlinear constraint design to facilitate the agent in learning optimal policies that align with specified user preferences across the entire preference space. Meanwhile, a comprehensive index based on hypervolume and entropy is presented to measure the convergence, diversity and evenness of the learned control policies. The proposed scheme is evaluated on nine multi-objective end-to-end robotic control tasks with continuous action space, and its effectiveness is demonstrated in comparison with the competitive baselines, including classical and state-of-the-art algorithms. |
author2 |
School of Mechanical and Aerospace Engineering |
author_facet |
School of Mechanical and Aerospace Engineering He, Xiangkun Hu, Zhongxu Yang, Haohan Lv, Chen |
format |
Article |
author |
He, Xiangkun Hu, Zhongxu Yang, Haohan Lv, Chen |
author_sort |
He, Xiangkun |
title |
Personalized robotic control via constrained multi-objective reinforcement learning |
title_short |
Personalized robotic control via constrained multi-objective reinforcement learning |
title_full |
Personalized robotic control via constrained multi-objective reinforcement learning |
title_fullStr |
Personalized robotic control via constrained multi-objective reinforcement learning |
title_full_unstemmed |
Personalized robotic control via constrained multi-objective reinforcement learning |
title_sort |
personalized robotic control via constrained multi-objective reinforcement learning |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/173290 |
_version_ |
1789483224639995904 |