Learning high-level robotic manipulation actions with visual predictive model

Learning visual predictive models has great potential for real-world robot manipulations. Visual predictive models serve as a model of real-world dynamics to comprehend the interactions between the robot and objects. However, prior works in the literature have focused mainly on low-level elementary...

Full description

Saved in:
Bibliographic Details
Main Authors: Ma, Anji, Chi, Guoyi, Ivaldi, Serena, Chen, Lipeng
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/173569
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-173569
record_format dspace
spelling sg-ntu-dr.10356-1735692024-02-16T15:39:21Z Learning high-level robotic manipulation actions with visual predictive model Ma, Anji Chi, Guoyi Ivaldi, Serena Chen, Lipeng School of Electrical and Electronic Engineering Engineering Robot manipulation Visual foresight Learning visual predictive models has great potential for real-world robot manipulations. Visual predictive models serve as a model of real-world dynamics to comprehend the interactions between the robot and objects. However, prior works in the literature have focused mainly on low-level elementary robot actions, which typically result in lengthy, inefficient, and highly complex robot manipulation. In contrast, humans usually employ top–down thinking of high-level actions rather than bottom–up stacking of low-level ones. To address this limitation, we present a novel formulation for robot manipulation that can be accomplished by pick-and-place, a commonly applied high-level robot action, through grasping. We propose a novel visual predictive model that combines an action decomposer and a video prediction network to learn the intrinsic semantic information of high-level actions. Experiments show that our model can accurately predict the object dynamics (i.e., the object movements under robot manipulation) while trained directly on observations of high-level pick-and-place actions. We also demonstrate that, together with a sampling-based planner, our model achieves a higher success rate using high-level actions on a variety of real robot manipulation tasks. Published version 2024-02-14T06:25:12Z 2024-02-14T06:25:12Z 2024 Journal Article Ma, A., Chi, G., Ivaldi, S. & Chen, L. (2024). Learning high-level robotic manipulation actions with visual predictive model. Complex and Intelligent Systems, 10(1), 811-823. https://dx.doi.org/10.1007/s40747-023-01174-5 2199-4536 https://hdl.handle.net/10356/173569 10.1007/s40747-023-01174-5 2-s2.0-85167340547 1 10 811 823 en Complex and Intelligent Systems © 2023 The Author(s). Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering
Robot manipulation
Visual foresight
spellingShingle Engineering
Robot manipulation
Visual foresight
Ma, Anji
Chi, Guoyi
Ivaldi, Serena
Chen, Lipeng
Learning high-level robotic manipulation actions with visual predictive model
description Learning visual predictive models has great potential for real-world robot manipulations. Visual predictive models serve as a model of real-world dynamics to comprehend the interactions between the robot and objects. However, prior works in the literature have focused mainly on low-level elementary robot actions, which typically result in lengthy, inefficient, and highly complex robot manipulation. In contrast, humans usually employ top–down thinking of high-level actions rather than bottom–up stacking of low-level ones. To address this limitation, we present a novel formulation for robot manipulation that can be accomplished by pick-and-place, a commonly applied high-level robot action, through grasping. We propose a novel visual predictive model that combines an action decomposer and a video prediction network to learn the intrinsic semantic information of high-level actions. Experiments show that our model can accurately predict the object dynamics (i.e., the object movements under robot manipulation) while trained directly on observations of high-level pick-and-place actions. We also demonstrate that, together with a sampling-based planner, our model achieves a higher success rate using high-level actions on a variety of real robot manipulation tasks.
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Ma, Anji
Chi, Guoyi
Ivaldi, Serena
Chen, Lipeng
format Article
author Ma, Anji
Chi, Guoyi
Ivaldi, Serena
Chen, Lipeng
author_sort Ma, Anji
title Learning high-level robotic manipulation actions with visual predictive model
title_short Learning high-level robotic manipulation actions with visual predictive model
title_full Learning high-level robotic manipulation actions with visual predictive model
title_fullStr Learning high-level robotic manipulation actions with visual predictive model
title_full_unstemmed Learning high-level robotic manipulation actions with visual predictive model
title_sort learning high-level robotic manipulation actions with visual predictive model
publishDate 2024
url https://hdl.handle.net/10356/173569
_version_ 1794549419196022784