COSM2IC: Optimizing real-time multi-modal instruction comprehension
Supporting real-time, on-device execution of multi-modal referring instruction comprehension models is an important challenge to be tackled in embodied Human-Robot Interaction. However, state-of-the-art deep learning models are resource-intensive and unsuitable for real-time execution on embedded de...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2022
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/7618 https://ink.library.smu.edu.sg/context/sis_research/article/8621/viewcontent/iros_final.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-8621 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-86212022-12-22T03:23:57Z COSM2IC: Optimizing real-time multi-modal instruction comprehension WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON, SUBBARAJU, Vigneshwaran TRAN, Minh Anh Tuan MISRA, Archan Supporting real-time, on-device execution of multi-modal referring instruction comprehension models is an important challenge to be tackled in embodied Human-Robot Interaction. However, state-of-the-art deep learning models are resource-intensive and unsuitable for real-time execution on embedded devices. While model compression can achieve a reduction in computational resources up to a certain point, further optimizations result in a severe drop in accuracy. To minimize this loss in accuracy, we propose the COSM2IC framework, with a lightweight Task Complexity Predictor, that uses multiple sensor inputs to assess the instructional complexity and thereby dynamically switch between a set of models of varying computational intensity such that computationally less demanding models are invoked whenever possible. To demonstrate the benefits of COSM2IC , we utilize a representative human-robot collaborative “table-top target acquisition” task, to curate a new multi-modal instruction dataset where a human issues instructions in a natural manner using a combination of visual, verbal, and gestural (pointing) cues. We show that COSM2IC achieves a 3-fold reduction in comprehension latency when compared to a baseline DNN model while suffering an accuracy loss of only ∼ 5%. When compared to state-of-the-art model compression methods, COSM2IC is able to achieve a further 30% reduction in latency and energy consumption for a comparable performance. 2022-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7618 info:doi/10.1109/LRA.2022.3194683 https://ink.library.smu.edu.sg/context/sis_research/article/8621/viewcontent/iros_final.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Deep Learning for Visual Perception Data Sets for Robotic Vision Embedded Systems for Robotic andAutomation Human-Robot Collaboration RGB-D Perception; Artificial Intelligence and Robotics Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Deep Learning for Visual Perception Data Sets for Robotic Vision Embedded Systems for Robotic andAutomation Human-Robot Collaboration RGB-D Perception; Artificial Intelligence and Robotics Databases and Information Systems |
spellingShingle |
Deep Learning for Visual Perception Data Sets for Robotic Vision Embedded Systems for Robotic andAutomation Human-Robot Collaboration RGB-D Perception; Artificial Intelligence and Robotics Databases and Information Systems WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON, SUBBARAJU, Vigneshwaran TRAN, Minh Anh Tuan MISRA, Archan COSM2IC: Optimizing real-time multi-modal instruction comprehension |
description |
Supporting real-time, on-device execution of multi-modal referring instruction comprehension models is an important challenge to be tackled in embodied Human-Robot Interaction. However, state-of-the-art deep learning models are resource-intensive and unsuitable for real-time execution on embedded devices. While model compression can achieve a reduction in computational resources up to a certain point, further optimizations result in a severe drop in accuracy. To minimize this loss in accuracy, we propose the COSM2IC framework, with a lightweight Task Complexity Predictor, that uses multiple sensor inputs to assess the instructional complexity and thereby dynamically switch between a set of models of varying computational intensity such that computationally less demanding models are invoked whenever possible. To demonstrate the benefits of COSM2IC , we utilize a representative human-robot collaborative “table-top target acquisition” task, to curate a new multi-modal instruction dataset where a human issues instructions in a natural manner using a combination of visual, verbal, and gestural (pointing) cues. We show that COSM2IC achieves a 3-fold reduction in comprehension latency when compared to a baseline DNN model while suffering an accuracy loss of only ∼ 5%. When compared to state-of-the-art model compression methods, COSM2IC is able to achieve a further 30% reduction in latency and energy consumption for a comparable performance. |
format |
text |
author |
WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON, SUBBARAJU, Vigneshwaran TRAN, Minh Anh Tuan MISRA, Archan |
author_facet |
WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON, SUBBARAJU, Vigneshwaran TRAN, Minh Anh Tuan MISRA, Archan |
author_sort |
WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON, |
title |
COSM2IC: Optimizing real-time multi-modal instruction comprehension |
title_short |
COSM2IC: Optimizing real-time multi-modal instruction comprehension |
title_full |
COSM2IC: Optimizing real-time multi-modal instruction comprehension |
title_fullStr |
COSM2IC: Optimizing real-time multi-modal instruction comprehension |
title_full_unstemmed |
COSM2IC: Optimizing real-time multi-modal instruction comprehension |
title_sort |
cosm2ic: optimizing real-time multi-modal instruction comprehension |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2022 |
url |
https://ink.library.smu.edu.sg/sis_research/7618 https://ink.library.smu.edu.sg/context/sis_research/article/8621/viewcontent/iros_final.pdf |
_version_ |
1770576395588599808 |