JALAD : joint accuracy- and latency-aware deep structure decoupling for edge-cloud execution

Recent years have witnessed a rapid growth of deep-network based services and applications. A practical and critical problem thus has emerged: how to effectively deploy the deep neural network models such that they can be executed efficiently. Conventional cloud-based approaches usually run the deep...

Full description

Saved in:

Bibliographic Details
Main Authors:	Li, Hongshan, Hu, Chenghao, Jiang, Jingyan, Wang, Zhi, Wen, Yonggang, Zhu, Wenwu
Other Authors:	School of Computer Science and Engineering
Format:	Conference or Workshop Item
Language:	English
Published:	2020
Subjects:	Engineering::Computer science and engineering Edge Computing Computation Off-loading
Online Access:	https://hdl.handle.net/10356/143195
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Description
Summary:	Recent years have witnessed a rapid growth of deep-network based services and applications. A practical and critical problem thus has emerged: how to effectively deploy the deep neural network models such that they can be executed efficiently. Conventional cloud-based approaches usually run the deep models in data center servers, causing large latency because a significant amount of data has to be transferred from the edge of network to the data center. In this paper, we propose JALAD, a joint accuracy- and latency-aware execution framework, which decouples a deep neural network so that a part of it will run at edge devices and the other part inside the conventional cloud, while only a minimum amount of data has to be transferred between them. Though the idea seems straightforward, we are facing challenges including i)how to find the best partition of a deep structure; ii)how to deploy the component at an edge device that only has limited computation power; and iii)how to minimize the overall execution latency. Our answers to these questions are a set of strategies in JALAD, including 1)A normalization based in-layer data compression strategy by jointly considering compression rate and model accuracy; 2)A latency-aware deep decoupling strategy to minimize the overall execution latency; and 3)An edge-cloud structure adaptation strategy that dynamically changes the decoupling for different network conditions. Experiments demonstrate that our solution can significantly reduce the execution latency: it speeds up the overall inference execution with a guaranteed model accuracy loss.

JALAD : joint accuracy- and latency-aware deep structure decoupling for edge-cloud execution

Similar Items