Learning high-level robotic manipulation actions with visual predictive model
Learning visual predictive models has great potential for real-world robot manipulations. Visual predictive models serve as a model of real-world dynamics to comprehend the interactions between the robot and objects. However, prior works in the literature have focused mainly on low-level elementary...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/173569 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-173569 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1735692024-02-16T15:39:21Z Learning high-level robotic manipulation actions with visual predictive model Ma, Anji Chi, Guoyi Ivaldi, Serena Chen, Lipeng School of Electrical and Electronic Engineering Engineering Robot manipulation Visual foresight Learning visual predictive models has great potential for real-world robot manipulations. Visual predictive models serve as a model of real-world dynamics to comprehend the interactions between the robot and objects. However, prior works in the literature have focused mainly on low-level elementary robot actions, which typically result in lengthy, inefficient, and highly complex robot manipulation. In contrast, humans usually employ top–down thinking of high-level actions rather than bottom–up stacking of low-level ones. To address this limitation, we present a novel formulation for robot manipulation that can be accomplished by pick-and-place, a commonly applied high-level robot action, through grasping. We propose a novel visual predictive model that combines an action decomposer and a video prediction network to learn the intrinsic semantic information of high-level actions. Experiments show that our model can accurately predict the object dynamics (i.e., the object movements under robot manipulation) while trained directly on observations of high-level pick-and-place actions. We also demonstrate that, together with a sampling-based planner, our model achieves a higher success rate using high-level actions on a variety of real robot manipulation tasks. Published version 2024-02-14T06:25:12Z 2024-02-14T06:25:12Z 2024 Journal Article Ma, A., Chi, G., Ivaldi, S. & Chen, L. (2024). Learning high-level robotic manipulation actions with visual predictive model. Complex and Intelligent Systems, 10(1), 811-823. https://dx.doi.org/10.1007/s40747-023-01174-5 2199-4536 https://hdl.handle.net/10356/173569 10.1007/s40747-023-01174-5 2-s2.0-85167340547 1 10 811 823 en Complex and Intelligent Systems © 2023 The Author(s). Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering Robot manipulation Visual foresight |
spellingShingle |
Engineering Robot manipulation Visual foresight Ma, Anji Chi, Guoyi Ivaldi, Serena Chen, Lipeng Learning high-level robotic manipulation actions with visual predictive model |
description |
Learning visual predictive models has great potential for real-world robot manipulations. Visual predictive models serve as a model of real-world dynamics to comprehend the interactions between the robot and objects. However, prior works in the literature have focused mainly on low-level elementary robot actions, which typically result in lengthy, inefficient, and highly complex robot manipulation. In contrast, humans usually employ top–down thinking of high-level actions rather than bottom–up stacking of low-level ones. To address this limitation, we present a novel formulation for robot manipulation that can be accomplished by pick-and-place, a commonly applied high-level robot action, through grasping. We propose a novel visual predictive model that combines an action decomposer and a video prediction network to learn the intrinsic semantic information of high-level actions. Experiments show that our model can accurately predict the object dynamics (i.e., the object movements under robot manipulation) while trained directly on observations of high-level pick-and-place actions. We also demonstrate that, together with a sampling-based planner, our model achieves a higher success rate using high-level actions on a variety of real robot manipulation tasks. |
author2 |
School of Electrical and Electronic Engineering |
author_facet |
School of Electrical and Electronic Engineering Ma, Anji Chi, Guoyi Ivaldi, Serena Chen, Lipeng |
format |
Article |
author |
Ma, Anji Chi, Guoyi Ivaldi, Serena Chen, Lipeng |
author_sort |
Ma, Anji |
title |
Learning high-level robotic manipulation actions with visual predictive model |
title_short |
Learning high-level robotic manipulation actions with visual predictive model |
title_full |
Learning high-level robotic manipulation actions with visual predictive model |
title_fullStr |
Learning high-level robotic manipulation actions with visual predictive model |
title_full_unstemmed |
Learning high-level robotic manipulation actions with visual predictive model |
title_sort |
learning high-level robotic manipulation actions with visual predictive model |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/173569 |
_version_ |
1794549419196022784 |