USN: a robust imitation learning method against diverse action noise

Learning from imperfect demonstrations is a crucial challenge in imitation learning (IL). Unlike existing works that still rely on the enormous effort of expert demonstrators, we consider a more cost-effective option for obtaining a large number of demonstrations. That is, hire annotators to label a...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Yu, Xingrui, Han, Bo, Tsang, Ivor Wai-Hung
مؤلفون آخرون:	School of Computer Science and Engineering
التنسيق:	مقال
اللغة:	English
منشور في:	2024
الموضوعات:	Computer and Information Science Learning methods Realistic scenario
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/179928
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-179928
record_format	dspace
spelling	sg-ntu-dr.10356-1799282024-09-06T15:36:13Z USN: a robust imitation learning method against diverse action noise Yu, Xingrui Han, Bo Tsang, Ivor Wai-Hung School of Computer Science and Engineering Centre for Frontier AI Research, ASTAR Singapore Institute of High-Performance Computing, AATAR Computer and Information Science Learning methods Realistic scenario Learning from imperfect demonstrations is a crucial challenge in imitation learning (IL). Unlike existing works that still rely on the enormous effort of expert demonstrators, we consider a more cost-effective option for obtaining a large number of demonstrations. That is, hire annotators to label actions for existing image records in realistic scenarios. However, action noise can occur when annotators are not domain experts or encounter confusing states. In this work, we introduce two particular forms of action noise, i.e., state-independent and state-dependent action noise. Previous IL methods fail to achieve expert-level performance when the demonstrations contain action noise, especially the state-dependent action noise. To mitigate the harmful effects of action noises, we propose a robust learning paradigm called USN (Uncertainty-aware Sample-selection with Negative learning). The model first estimates the predictive uncertainty for all demonstration data and then selects samples with high loss based on the uncertainty measures. Finally, it updates the model parameters with additional negative learning on the selected samples. Empirical results in Box2D tasks and Atari games show that USN consistently improves the final rewards of behavioral cloning, online imitation learning, and offline imitation learning methods under various action noises. The ratio of significant improvements is up to 94.44%. Moreover, our method scales to conditional imitation learning with real-world noisy commands in urban driving. Maritime and Port Authority of Singapore (MPA) National Research Foundation (NRF) National Supercomputing Centre (NSCC) Singapore Singapore Maritime Institute (SMI) Published version XY was supported by China Scholarship Council No. 201806450045, Australian Artificial Intelligence Institute (AAII), University of Technology Sydney (UTS), Australia, and Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR), Singapore (https://www.a-star.edu.sg/cfar). BH was supported by the NSFC General Program No. 62376235, Guangdong Basic and Applied Basic Research Foundation No. 2022A1515011652, HKBU Faculty Niche Research Areas No. RC-FNRA-IG/22-23/SCI/04, and HKBU CSD Departmental Incentive Scheme. IWT was supported by Australian Artificial Intelligence Institute (AAII), University of Technology Sydney (UTS), Australia. This research was partially supported by the National Research Foundation, Singapore, and the Maritime and Port Authority of Singapore / Singapore Maritime Institute under the Maritime Transformation Programme (Maritime Artificial Intelligence (AI) Research Programme – Grant number SMI-2022-MTP-06). The computational work for this article was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg). 2024-09-03T05:13:38Z 2024-09-03T05:13:38Z 2024 Journal Article Yu, X., Han, B. & Tsang, I. W. (2024). USN: a robust imitation learning method against diverse action noise. Journal of Artificial Intelligence Research, 79, 1237-1280. https://dx.doi.org/10.1613/jair.1.15819 1076-9757 https://hdl.handle.net/10356/179928 10.1613/jair.1.15819 2-s2.0-85192802474 79 1237 1280 en SMI-2022-MTP-06 Journal of Artificial Intelligence Research © 2024 The Authors. Published by AI Access Foundation under Creative Commons Attribution License CC BY 4.0. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Learning methods Realistic scenario
spellingShingle	Computer and Information Science Learning methods Realistic scenario Yu, Xingrui Han, Bo Tsang, Ivor Wai-Hung USN: a robust imitation learning method against diverse action noise
description	Learning from imperfect demonstrations is a crucial challenge in imitation learning (IL). Unlike existing works that still rely on the enormous effort of expert demonstrators, we consider a more cost-effective option for obtaining a large number of demonstrations. That is, hire annotators to label actions for existing image records in realistic scenarios. However, action noise can occur when annotators are not domain experts or encounter confusing states. In this work, we introduce two particular forms of action noise, i.e., state-independent and state-dependent action noise. Previous IL methods fail to achieve expert-level performance when the demonstrations contain action noise, especially the state-dependent action noise. To mitigate the harmful effects of action noises, we propose a robust learning paradigm called USN (Uncertainty-aware Sample-selection with Negative learning). The model first estimates the predictive uncertainty for all demonstration data and then selects samples with high loss based on the uncertainty measures. Finally, it updates the model parameters with additional negative learning on the selected samples. Empirical results in Box2D tasks and Atari games show that USN consistently improves the final rewards of behavioral cloning, online imitation learning, and offline imitation learning methods under various action noises. The ratio of significant improvements is up to 94.44%. Moreover, our method scales to conditional imitation learning with real-world noisy commands in urban driving.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Yu, Xingrui Han, Bo Tsang, Ivor Wai-Hung
format	Article
author	Yu, Xingrui Han, Bo Tsang, Ivor Wai-Hung
author_sort	Yu, Xingrui
title	USN: a robust imitation learning method against diverse action noise
title_short	USN: a robust imitation learning method against diverse action noise
title_full	USN: a robust imitation learning method against diverse action noise
title_fullStr	USN: a robust imitation learning method against diverse action noise
title_full_unstemmed	USN: a robust imitation learning method against diverse action noise
title_sort	usn: a robust imitation learning method against diverse action noise
publishDate	2024
url	https://hdl.handle.net/10356/179928
_version_	1814047353567444992

USN: a robust imitation learning method against diverse action noise

مواد مشابهة