EvoLP: self-evolving latency predictor for model compression in real-time edge systems
Edge devices are increasingly utilized for deploying deep learning applications on embedded systems. The real-time nature of many applications and the limited resources of edge devices necessitate latency-targeted neural network compression. However, measuring latency on real devices is challenging...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/171636 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-171636 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1716362023-11-03T15:36:23Z EvoLP: self-evolving latency predictor for model compression in real-time edge systems Huai, Shuo Kong, Hao Li, Shiqing Luo, Xiangzhong Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen School of Computer Science and Engineering HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering Predictive Models Hardware Edge devices are increasingly utilized for deploying deep learning applications on embedded systems. The real-time nature of many applications and the limited resources of edge devices necessitate latency-targeted neural network compression. However, measuring latency on real devices is challenging and expensive. Therefore, this letter presents a novel and efficient framework, named EvoLP, to accurately predict the inference latency of models on edge devices. This predictor can evolve to achieve higher latency prediction precision during the network compression process. Experimental results demonstrate that EvoLP outperforms previous state-of-the-art approaches by being evaluated on three edge devices and four model variants. Moreover, when incorporated into a model compression framework, it effectively guides the compression process for higher model accuracy while satisfying strict latency constraints. We open source EvoLP at https://github.com/ntuliuteam/EvoLP. Ministry of Education (MOE) Nanyang Technological University Submitted/Accepted version This work is partially supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner, HP Inc., through the HP-NTU Digital Manufacturing Corporate Lab (I1801E0028), and partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071), and Nanyang Technological University, Singapore, under its NAP. 2023-11-02T00:55:06Z 2023-11-02T00:55:06Z 2023 Journal Article Huai, S., Kong, H., Li, S., Luo, X., Subramaniam, R., Makaya, C., Lin, Q. & Liu, W. (2023). EvoLP: self-evolving latency predictor for model compression in real-time edge systems. IEEE Embedded Systems Letters. https://dx.doi.org/10.1109/LES.2023.3321599 1943-0663 https://hdl.handle.net/10356/171636 10.1109/LES.2023.3321599 2-s2.0-85174850306 en IAF-ICP I1801E0028 MOE2019-T2-1-071 NAP IEEE Embedded Systems Letters © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/LES.2023.3321599. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Predictive Models Hardware |
spellingShingle |
Engineering::Computer science and engineering Predictive Models Hardware Huai, Shuo Kong, Hao Li, Shiqing Luo, Xiangzhong Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen EvoLP: self-evolving latency predictor for model compression in real-time edge systems |
description |
Edge devices are increasingly utilized for deploying deep learning applications on embedded systems. The real-time nature of many applications and the limited resources of edge devices necessitate latency-targeted neural network compression. However, measuring latency on real devices is challenging and expensive. Therefore, this letter presents a novel and efficient framework, named EvoLP, to accurately predict the inference latency of models on edge devices. This predictor can evolve to achieve higher latency prediction precision during the network compression process. Experimental results demonstrate that EvoLP outperforms previous state-of-the-art approaches by being evaluated on three edge devices and four model variants. Moreover, when incorporated into a model compression framework, it effectively guides the compression process for higher model accuracy while satisfying strict latency constraints. We open source EvoLP at https://github.com/ntuliuteam/EvoLP. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Huai, Shuo Kong, Hao Li, Shiqing Luo, Xiangzhong Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen |
format |
Article |
author |
Huai, Shuo Kong, Hao Li, Shiqing Luo, Xiangzhong Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen |
author_sort |
Huai, Shuo |
title |
EvoLP: self-evolving latency predictor for model compression in real-time edge systems |
title_short |
EvoLP: self-evolving latency predictor for model compression in real-time edge systems |
title_full |
EvoLP: self-evolving latency predictor for model compression in real-time edge systems |
title_fullStr |
EvoLP: self-evolving latency predictor for model compression in real-time edge systems |
title_full_unstemmed |
EvoLP: self-evolving latency predictor for model compression in real-time edge systems |
title_sort |
evolp: self-evolving latency predictor for model compression in real-time edge systems |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/171636 |
_version_ |
1781793720946917376 |