Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization

Deep learning applications have been widely adopted on edge devices, to mitigate the privacy and latency issues of accessing cloud servers. Deciding the number of neurons during the design of a deep neural network to maximize performance is not intuitive. Particularly, many application scenarios are...

Full description

Saved in:

Bibliographic Details
Main Authors:	Huai, Shuo, Liu, Di, Kong, Hao, Liu, Weichen, Subramaniam, Ravi, Makaya, Christian, Lin, Qian
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering::Software::Software engineering Latency Optimization Edge Device Latency Prediction
Online Access:	https://hdl.handle.net/10356/165565
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-165565
record_format	dspace
spelling	sg-ntu-dr.10356-1655652023-03-31T02:44:39Z Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization Huai, Shuo Liu, Di Kong, Hao Liu, Weichen Subramaniam, Ravi Makaya, Christian Lin, Qian School of Computer Science and Engineering HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering::Software::Software engineering Latency Optimization Edge Device Latency Prediction Deep learning applications have been widely adopted on edge devices, to mitigate the privacy and latency issues of accessing cloud servers. Deciding the number of neurons during the design of a deep neural network to maximize performance is not intuitive. Particularly, many application scenarios are real-time and have a strict latency constraint, while conventional neural network optimization methods do not directly change the temporal cost of model inference for latency-critical edge systems. In this work, we propose a latency-oriented neural network learning method to optimize models for high accuracy while fulfilling the latency constraint. For efficiency, we also introduce a universal hardware-customized latency predictor to optimize this procedure to learn a model that satisfies the latency constraint by only a one-shot training process. The experiment results reveal that, compared to state-of-the-art methods, our approach can well-fit the ‘hard’ latency constraint and achieve high accuracy. Under the same training settings as the original model and satisfying a 34 ms latency constraint on the ImageNet-100 dataset, we reduce GoogLeNet's latency from 40.32 ms to 34 ms with a 0.14% accuracy reduction on the NVIDIA Jetson Nano. When coupled with quantization, our method can be further improved to only 0.04% drop for GoogLeNet. On the NVIDIA Jetson TX2, we compress VGG-19 from 119.98 ms to 34 ms and even improve its accuracy by 0.5%, and we scale GoogLeNet up from 20.27 ms to 34 ms and achieve higher accuracy by 0.78%. We also open source this framework at https://github.com/ntuliuteam/ZeroBN. This study is supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner, HP Inc., through the HP-NTU Digital Manufacturing Corporate Lab (I1801E0028). 2023-03-31T02:44:39Z 2023-03-31T02:44:39Z 2023 Journal Article Huai, S., Liu, D., Kong, H., Liu, W., Subramaniam, R., Makaya, C. & Lin, Q. (2023). Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization. Future Generation Computer Systems, 142, 314-327. https://dx.doi.org/10.1016/j.future.2022.12.021 0167-739X https://hdl.handle.net/10356/165565 10.1016/j.future.2022.12.021 2-s2.0-85146436384 142 314 327 en I1801E0028 Future Generation Computer Systems © 2022 Elsevier B.V. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Software::Software engineering Latency Optimization Edge Device Latency Prediction
spellingShingle	Engineering::Computer science and engineering::Software::Software engineering Latency Optimization Edge Device Latency Prediction Huai, Shuo Liu, Di Kong, Hao Liu, Weichen Subramaniam, Ravi Makaya, Christian Lin, Qian Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
description	Deep learning applications have been widely adopted on edge devices, to mitigate the privacy and latency issues of accessing cloud servers. Deciding the number of neurons during the design of a deep neural network to maximize performance is not intuitive. Particularly, many application scenarios are real-time and have a strict latency constraint, while conventional neural network optimization methods do not directly change the temporal cost of model inference for latency-critical edge systems. In this work, we propose a latency-oriented neural network learning method to optimize models for high accuracy while fulfilling the latency constraint. For efficiency, we also introduce a universal hardware-customized latency predictor to optimize this procedure to learn a model that satisfies the latency constraint by only a one-shot training process. The experiment results reveal that, compared to state-of-the-art methods, our approach can well-fit the ‘hard’ latency constraint and achieve high accuracy. Under the same training settings as the original model and satisfying a 34 ms latency constraint on the ImageNet-100 dataset, we reduce GoogLeNet's latency from 40.32 ms to 34 ms with a 0.14% accuracy reduction on the NVIDIA Jetson Nano. When coupled with quantization, our method can be further improved to only 0.04% drop for GoogLeNet. On the NVIDIA Jetson TX2, we compress VGG-19 from 119.98 ms to 34 ms and even improve its accuracy by 0.5%, and we scale GoogLeNet up from 20.27 ms to 34 ms and achieve higher accuracy by 0.78%. We also open source this framework at https://github.com/ntuliuteam/ZeroBN.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Huai, Shuo Liu, Di Kong, Hao Liu, Weichen Subramaniam, Ravi Makaya, Christian Lin, Qian
format	Article
author	Huai, Shuo Liu, Di Kong, Hao Liu, Weichen Subramaniam, Ravi Makaya, Christian Lin, Qian
author_sort	Huai, Shuo
title	Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
title_short	Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
title_full	Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
title_fullStr	Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
title_full_unstemmed	Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
title_sort	latency-constrained dnn architecture learning for edge systems using zerorized batch normalization
publishDate	2023
url	https://hdl.handle.net/10356/165565
_version_	1762031122182569984

Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization

Similar Items