CAS: Fusing DNN optimization & adaptive sensing for energy-efficient multi-modal inference

Intelligent virtual agents are used to accomplish complex multi-modal tasks such as human instruction comprehension in mixed-reality environments by increasingly adopting richer, energy-intensive sensors and processing pipelines. In such applications, the context for activating sensors and processin...

Full description

Saved in:

Bibliographic Details
Main Authors:	WEERAKOON, Dulanga, SUBBARAJU, Vigneshwaran, LIM, Joo Hwee, MISRA, Archan
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Deep Learning for Visual Perception Embedded Systems for Robotic and Automation Human-Robot Collaboration RGB-D Perception Vision and Sensor-Based Control Artificial Intelligence and Robotics
Online Access:	https://ink.library.smu.edu.sg/sis_research/9360 https://ink.library.smu.edu.sg/context/sis_research/article/10360/viewcontent/CAS_camready.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Description
Summary:	Intelligent virtual agents are used to accomplish complex multi-modal tasks such as human instruction comprehension in mixed-reality environments by increasingly adopting richer, energy-intensive sensors and processing pipelines. In such applications, the context for activating sensors and processing blocks required to accomplish a given task instance is usually manifested via multiple sensing modes. Based on this observation, we introduce a novel Commit-and-Switch ( CAS ) paradigm that simultaneously seeks to reduce both sensing and processing energy. In CAS , we first commit to a low-energy computational pipeline with a subset of available sensors. Then, the task context estimated by this pipeline is used to optionally switch to another energy-intensive DNN pipeline and activate additional sensors. We demonstrate how CAS's paradigm of interweaving DNN computation and sensor triggering can be instantiated principally by constructing multi-head DNN models and jointly optimizing the accuracy and sensing costs associated with different heads. We exemplify CAS via the development of the RealGIN-MH model for multi-modal target acquisition tasks, a core enabler of immersive human-agent interaction. RealGIN-MH achieves 12.9x reduction in energy overheads, while outperforming baseline dynamic model optimization approaches.

CAS: Fusing DNN optimization & adaptive sensing for energy-efficient multi-modal inference

Similar Items