Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
Deep learning applications have been widely adopted on edge devices, to mitigate the privacy and latency issues of accessing cloud servers. Deciding the number of neurons during the design of a deep neural network to maximize performance is not intuitive. Particularly, many application scenarios are...
Saved in:
Main Authors: | , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/165565 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-165565 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1655652023-03-31T02:44:39Z Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization Huai, Shuo Liu, Di Kong, Hao Liu, Weichen Subramaniam, Ravi Makaya, Christian Lin, Qian School of Computer Science and Engineering HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering::Software::Software engineering Latency Optimization Edge Device Latency Prediction Deep learning applications have been widely adopted on edge devices, to mitigate the privacy and latency issues of accessing cloud servers. Deciding the number of neurons during the design of a deep neural network to maximize performance is not intuitive. Particularly, many application scenarios are real-time and have a strict latency constraint, while conventional neural network optimization methods do not directly change the temporal cost of model inference for latency-critical edge systems. In this work, we propose a latency-oriented neural network learning method to optimize models for high accuracy while fulfilling the latency constraint. For efficiency, we also introduce a universal hardware-customized latency predictor to optimize this procedure to learn a model that satisfies the latency constraint by only a one-shot training process. The experiment results reveal that, compared to state-of-the-art methods, our approach can well-fit the ‘hard’ latency constraint and achieve high accuracy. Under the same training settings as the original model and satisfying a 34 ms latency constraint on the ImageNet-100 dataset, we reduce GoogLeNet's latency from 40.32 ms to 34 ms with a 0.14% accuracy reduction on the NVIDIA Jetson Nano. When coupled with quantization, our method can be further improved to only 0.04% drop for GoogLeNet. On the NVIDIA Jetson TX2, we compress VGG-19 from 119.98 ms to 34 ms and even improve its accuracy by 0.5%, and we scale GoogLeNet up from 20.27 ms to 34 ms and achieve higher accuracy by 0.78%. We also open source this framework at https://github.com/ntuliuteam/ZeroBN. This study is supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner, HP Inc., through the HP-NTU Digital Manufacturing Corporate Lab (I1801E0028). 2023-03-31T02:44:39Z 2023-03-31T02:44:39Z 2023 Journal Article Huai, S., Liu, D., Kong, H., Liu, W., Subramaniam, R., Makaya, C. & Lin, Q. (2023). Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization. Future Generation Computer Systems, 142, 314-327. https://dx.doi.org/10.1016/j.future.2022.12.021 0167-739X https://hdl.handle.net/10356/165565 10.1016/j.future.2022.12.021 2-s2.0-85146436384 142 314 327 en I1801E0028 Future Generation Computer Systems © 2022 Elsevier B.V. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Software::Software engineering Latency Optimization Edge Device Latency Prediction |
spellingShingle |
Engineering::Computer science and engineering::Software::Software engineering Latency Optimization Edge Device Latency Prediction Huai, Shuo Liu, Di Kong, Hao Liu, Weichen Subramaniam, Ravi Makaya, Christian Lin, Qian Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization |
description |
Deep learning applications have been widely adopted on edge devices, to mitigate the privacy and latency issues of accessing cloud servers. Deciding the number of neurons during the design of a deep neural network to maximize performance is not intuitive. Particularly, many application scenarios are real-time and have a strict latency constraint, while conventional neural network optimization methods do not directly change the temporal cost of model inference for latency-critical edge systems. In this work, we propose a latency-oriented neural network learning method to optimize models for high accuracy while fulfilling the latency constraint. For efficiency, we also introduce a universal hardware-customized latency predictor to optimize this procedure to learn a model that satisfies the latency constraint by only a one-shot training process. The experiment results reveal that, compared to state-of-the-art methods, our approach can well-fit the ‘hard’ latency constraint and achieve high accuracy. Under the same training settings as the original model and satisfying a 34 ms latency constraint on the ImageNet-100 dataset, we reduce GoogLeNet's latency from 40.32 ms to 34 ms with a 0.14% accuracy reduction on the NVIDIA Jetson Nano. When coupled with quantization, our method can be further improved to only 0.04% drop for GoogLeNet. On the NVIDIA Jetson TX2, we compress VGG-19 from 119.98 ms to 34 ms and even improve its accuracy by 0.5%, and we scale GoogLeNet up from 20.27 ms to 34 ms and achieve higher accuracy by 0.78%. We also open source this framework at https://github.com/ntuliuteam/ZeroBN. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Huai, Shuo Liu, Di Kong, Hao Liu, Weichen Subramaniam, Ravi Makaya, Christian Lin, Qian |
format |
Article |
author |
Huai, Shuo Liu, Di Kong, Hao Liu, Weichen Subramaniam, Ravi Makaya, Christian Lin, Qian |
author_sort |
Huai, Shuo |
title |
Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization |
title_short |
Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization |
title_full |
Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization |
title_fullStr |
Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization |
title_full_unstemmed |
Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization |
title_sort |
latency-constrained dnn architecture learning for edge systems using zerorized batch normalization |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/165565 |
_version_ |
1762031122182569984 |