Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization

Deep learning applications have been widely adopted on edge devices, to mitigate the privacy and latency issues of accessing cloud servers. Deciding the number of neurons during the design of a deep neural network to maximize performance is not intuitive. Particularly, many application scenarios are...

Full description

Saved in:
Bibliographic Details
Main Authors: Huai, Shuo, Liu, Di, Kong, Hao, Liu, Weichen, Subramaniam, Ravi, Makaya, Christian, Lin, Qian
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165565
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-165565
record_format dspace
spelling sg-ntu-dr.10356-1655652023-03-31T02:44:39Z Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization Huai, Shuo Liu, Di Kong, Hao Liu, Weichen Subramaniam, Ravi Makaya, Christian Lin, Qian School of Computer Science and Engineering HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering::Software::Software engineering Latency Optimization Edge Device Latency Prediction Deep learning applications have been widely adopted on edge devices, to mitigate the privacy and latency issues of accessing cloud servers. Deciding the number of neurons during the design of a deep neural network to maximize performance is not intuitive. Particularly, many application scenarios are real-time and have a strict latency constraint, while conventional neural network optimization methods do not directly change the temporal cost of model inference for latency-critical edge systems. In this work, we propose a latency-oriented neural network learning method to optimize models for high accuracy while fulfilling the latency constraint. For efficiency, we also introduce a universal hardware-customized latency predictor to optimize this procedure to learn a model that satisfies the latency constraint by only a one-shot training process. The experiment results reveal that, compared to state-of-the-art methods, our approach can well-fit the ‘hard’ latency constraint and achieve high accuracy. Under the same training settings as the original model and satisfying a 34 ms latency constraint on the ImageNet-100 dataset, we reduce GoogLeNet's latency from 40.32 ms to 34 ms with a 0.14% accuracy reduction on the NVIDIA Jetson Nano. When coupled with quantization, our method can be further improved to only 0.04% drop for GoogLeNet. On the NVIDIA Jetson TX2, we compress VGG-19 from 119.98 ms to 34 ms and even improve its accuracy by 0.5%, and we scale GoogLeNet up from 20.27 ms to 34 ms and achieve higher accuracy by 0.78%. We also open source this framework at https://github.com/ntuliuteam/ZeroBN. This study is supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner, HP Inc., through the HP-NTU Digital Manufacturing Corporate Lab (I1801E0028). 2023-03-31T02:44:39Z 2023-03-31T02:44:39Z 2023 Journal Article Huai, S., Liu, D., Kong, H., Liu, W., Subramaniam, R., Makaya, C. & Lin, Q. (2023). Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization. Future Generation Computer Systems, 142, 314-327. https://dx.doi.org/10.1016/j.future.2022.12.021 0167-739X https://hdl.handle.net/10356/165565 10.1016/j.future.2022.12.021 2-s2.0-85146436384 142 314 327 en I1801E0028 Future Generation Computer Systems © 2022 Elsevier B.V. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Software::Software engineering
Latency Optimization
Edge Device
Latency Prediction
spellingShingle Engineering::Computer science and engineering::Software::Software engineering
Latency Optimization
Edge Device
Latency Prediction
Huai, Shuo
Liu, Di
Kong, Hao
Liu, Weichen
Subramaniam, Ravi
Makaya, Christian
Lin, Qian
Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
description Deep learning applications have been widely adopted on edge devices, to mitigate the privacy and latency issues of accessing cloud servers. Deciding the number of neurons during the design of a deep neural network to maximize performance is not intuitive. Particularly, many application scenarios are real-time and have a strict latency constraint, while conventional neural network optimization methods do not directly change the temporal cost of model inference for latency-critical edge systems. In this work, we propose a latency-oriented neural network learning method to optimize models for high accuracy while fulfilling the latency constraint. For efficiency, we also introduce a universal hardware-customized latency predictor to optimize this procedure to learn a model that satisfies the latency constraint by only a one-shot training process. The experiment results reveal that, compared to state-of-the-art methods, our approach can well-fit the ‘hard’ latency constraint and achieve high accuracy. Under the same training settings as the original model and satisfying a 34 ms latency constraint on the ImageNet-100 dataset, we reduce GoogLeNet's latency from 40.32 ms to 34 ms with a 0.14% accuracy reduction on the NVIDIA Jetson Nano. When coupled with quantization, our method can be further improved to only 0.04% drop for GoogLeNet. On the NVIDIA Jetson TX2, we compress VGG-19 from 119.98 ms to 34 ms and even improve its accuracy by 0.5%, and we scale GoogLeNet up from 20.27 ms to 34 ms and achieve higher accuracy by 0.78%. We also open source this framework at https://github.com/ntuliuteam/ZeroBN.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Huai, Shuo
Liu, Di
Kong, Hao
Liu, Weichen
Subramaniam, Ravi
Makaya, Christian
Lin, Qian
format Article
author Huai, Shuo
Liu, Di
Kong, Hao
Liu, Weichen
Subramaniam, Ravi
Makaya, Christian
Lin, Qian
author_sort Huai, Shuo
title Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
title_short Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
title_full Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
title_fullStr Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
title_full_unstemmed Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
title_sort latency-constrained dnn architecture learning for edge systems using zerorized batch normalization
publishDate 2023
url https://hdl.handle.net/10356/165565
_version_ 1762031122182569984