COSM2IC: Optimizing real-time multi-modal instruction comprehension

Supporting real-time, on-device execution of multi-modal referring instruction comprehension models is an important challenge to be tackled in embodied Human-Robot Interaction. However, state-of-the-art deep learning models are resource-intensive and unsuitable for real-time execution on embedded de...

Full description

Saved in:
Bibliographic Details
Main Authors: WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON, SUBBARAJU, Vigneshwaran, TRAN, Minh Anh Tuan, MISRA, Archan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7618
https://ink.library.smu.edu.sg/context/sis_research/article/8621/viewcontent/iros_final.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8621
record_format dspace
spelling sg-smu-ink.sis_research-86212022-12-22T03:23:57Z COSM2IC: Optimizing real-time multi-modal instruction comprehension WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON, SUBBARAJU, Vigneshwaran TRAN, Minh Anh Tuan MISRA, Archan Supporting real-time, on-device execution of multi-modal referring instruction comprehension models is an important challenge to be tackled in embodied Human-Robot Interaction. However, state-of-the-art deep learning models are resource-intensive and unsuitable for real-time execution on embedded devices. While model compression can achieve a reduction in computational resources up to a certain point, further optimizations result in a severe drop in accuracy. To minimize this loss in accuracy, we propose the COSM2IC framework, with a lightweight Task Complexity Predictor, that uses multiple sensor inputs to assess the instructional complexity and thereby dynamically switch between a set of models of varying computational intensity such that computationally less demanding models are invoked whenever possible. To demonstrate the benefits of COSM2IC , we utilize a representative human-robot collaborative “table-top target acquisition” task, to curate a new multi-modal instruction dataset where a human issues instructions in a natural manner using a combination of visual, verbal, and gestural (pointing) cues. We show that COSM2IC achieves a 3-fold reduction in comprehension latency when compared to a baseline DNN model while suffering an accuracy loss of only ∼ 5%. When compared to state-of-the-art model compression methods, COSM2IC is able to achieve a further 30% reduction in latency and energy consumption for a comparable performance. 2022-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7618 info:doi/10.1109/LRA.2022.3194683 https://ink.library.smu.edu.sg/context/sis_research/article/8621/viewcontent/iros_final.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Deep Learning for Visual Perception Data Sets for Robotic Vision Embedded Systems for Robotic andAutomation Human-Robot Collaboration RGB-D Perception; Artificial Intelligence and Robotics Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Deep Learning for Visual Perception
Data Sets for Robotic Vision
Embedded Systems for Robotic andAutomation
Human-Robot Collaboration
RGB-D Perception;
Artificial Intelligence and Robotics
Databases and Information Systems
spellingShingle Deep Learning for Visual Perception
Data Sets for Robotic Vision
Embedded Systems for Robotic andAutomation
Human-Robot Collaboration
RGB-D Perception;
Artificial Intelligence and Robotics
Databases and Information Systems
WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON,
SUBBARAJU, Vigneshwaran
TRAN, Minh Anh Tuan
MISRA, Archan
COSM2IC: Optimizing real-time multi-modal instruction comprehension
description Supporting real-time, on-device execution of multi-modal referring instruction comprehension models is an important challenge to be tackled in embodied Human-Robot Interaction. However, state-of-the-art deep learning models are resource-intensive and unsuitable for real-time execution on embedded devices. While model compression can achieve a reduction in computational resources up to a certain point, further optimizations result in a severe drop in accuracy. To minimize this loss in accuracy, we propose the COSM2IC framework, with a lightweight Task Complexity Predictor, that uses multiple sensor inputs to assess the instructional complexity and thereby dynamically switch between a set of models of varying computational intensity such that computationally less demanding models are invoked whenever possible. To demonstrate the benefits of COSM2IC , we utilize a representative human-robot collaborative “table-top target acquisition” task, to curate a new multi-modal instruction dataset where a human issues instructions in a natural manner using a combination of visual, verbal, and gestural (pointing) cues. We show that COSM2IC achieves a 3-fold reduction in comprehension latency when compared to a baseline DNN model while suffering an accuracy loss of only ∼ 5%. When compared to state-of-the-art model compression methods, COSM2IC is able to achieve a further 30% reduction in latency and energy consumption for a comparable performance.
format text
author WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON,
SUBBARAJU, Vigneshwaran
TRAN, Minh Anh Tuan
MISRA, Archan
author_facet WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON,
SUBBARAJU, Vigneshwaran
TRAN, Minh Anh Tuan
MISRA, Archan
author_sort WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON,
title COSM2IC: Optimizing real-time multi-modal instruction comprehension
title_short COSM2IC: Optimizing real-time multi-modal instruction comprehension
title_full COSM2IC: Optimizing real-time multi-modal instruction comprehension
title_fullStr COSM2IC: Optimizing real-time multi-modal instruction comprehension
title_full_unstemmed COSM2IC: Optimizing real-time multi-modal instruction comprehension
title_sort cosm2ic: optimizing real-time multi-modal instruction comprehension
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/7618
https://ink.library.smu.edu.sg/context/sis_research/article/8621/viewcontent/iros_final.pdf
_version_ 1770576395588599808