Learning high-level robotic manipulation actions with visual predictive model

Learning visual predictive models has great potential for real-world robot manipulations. Visual predictive models serve as a model of real-world dynamics to comprehend the interactions between the robot and objects. However, prior works in the literature have focused mainly on low-level elementary...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Ma, Anji, Chi, Guoyi, Ivaldi, Serena, Chen, Lipeng
مؤلفون آخرون:	School of Electrical and Electronic Engineering
التنسيق:	مقال
اللغة:	English
منشور في:	2024
الموضوعات:	Engineering Robot manipulation Visual foresight
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/173569
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

الوصف
الملخص:	Learning visual predictive models has great potential for real-world robot manipulations. Visual predictive models serve as a model of real-world dynamics to comprehend the interactions between the robot and objects. However, prior works in the literature have focused mainly on low-level elementary robot actions, which typically result in lengthy, inefficient, and highly complex robot manipulation. In contrast, humans usually employ top–down thinking of high-level actions rather than bottom–up stacking of low-level ones. To address this limitation, we present a novel formulation for robot manipulation that can be accomplished by pick-and-place, a commonly applied high-level robot action, through grasping. We propose a novel visual predictive model that combines an action decomposer and a video prediction network to learn the intrinsic semantic information of high-level actions. Experiments show that our model can accurately predict the object dynamics (i.e., the object movements under robot manipulation) while trained directly on observations of high-level pick-and-place actions. We also demonstrate that, together with a sampling-based planner, our model achieves a higher success rate using high-level actions on a variety of real robot manipulation tasks.

Learning high-level robotic manipulation actions with visual predictive model

مواد مشابهة