EvoLP: self-evolving latency predictor for model compression in real-time edge systems

Edge devices are increasingly utilized for deploying deep learning applications on embedded systems. The real-time nature of many applications and the limited resources of edge devices necessitate latency-targeted neural network compression. However, measuring latency on real devices is challenging...

Full description

Saved in:
Bibliographic Details
Main Authors: Huai, Shuo, Kong, Hao, Li, Shiqing, Luo, Xiangzhong, Subramaniam, Ravi, Makaya, Christian, Lin, Qian, Liu, Weichen
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171636
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-171636
record_format dspace
spelling sg-ntu-dr.10356-1716362023-11-03T15:36:23Z EvoLP: self-evolving latency predictor for model compression in real-time edge systems Huai, Shuo Kong, Hao Li, Shiqing Luo, Xiangzhong Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen School of Computer Science and Engineering HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering Predictive Models Hardware Edge devices are increasingly utilized for deploying deep learning applications on embedded systems. The real-time nature of many applications and the limited resources of edge devices necessitate latency-targeted neural network compression. However, measuring latency on real devices is challenging and expensive. Therefore, this letter presents a novel and efficient framework, named EvoLP, to accurately predict the inference latency of models on edge devices. This predictor can evolve to achieve higher latency prediction precision during the network compression process. Experimental results demonstrate that EvoLP outperforms previous state-of-the-art approaches by being evaluated on three edge devices and four model variants. Moreover, when incorporated into a model compression framework, it effectively guides the compression process for higher model accuracy while satisfying strict latency constraints. We open source EvoLP at https://github.com/ntuliuteam/EvoLP. Ministry of Education (MOE) Nanyang Technological University Submitted/Accepted version This work is partially supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner, HP Inc., through the HP-NTU Digital Manufacturing Corporate Lab (I1801E0028), and partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071), and Nanyang Technological University, Singapore, under its NAP. 2023-11-02T00:55:06Z 2023-11-02T00:55:06Z 2023 Journal Article Huai, S., Kong, H., Li, S., Luo, X., Subramaniam, R., Makaya, C., Lin, Q. & Liu, W. (2023). EvoLP: self-evolving latency predictor for model compression in real-time edge systems. IEEE Embedded Systems Letters. https://dx.doi.org/10.1109/LES.2023.3321599 1943-0663 https://hdl.handle.net/10356/171636 10.1109/LES.2023.3321599 2-s2.0-85174850306 en IAF-ICP I1801E0028 MOE2019-T2-1-071 NAP IEEE Embedded Systems Letters © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/LES.2023.3321599. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Predictive Models
Hardware
spellingShingle Engineering::Computer science and engineering
Predictive Models
Hardware
Huai, Shuo
Kong, Hao
Li, Shiqing
Luo, Xiangzhong
Subramaniam, Ravi
Makaya, Christian
Lin, Qian
Liu, Weichen
EvoLP: self-evolving latency predictor for model compression in real-time edge systems
description Edge devices are increasingly utilized for deploying deep learning applications on embedded systems. The real-time nature of many applications and the limited resources of edge devices necessitate latency-targeted neural network compression. However, measuring latency on real devices is challenging and expensive. Therefore, this letter presents a novel and efficient framework, named EvoLP, to accurately predict the inference latency of models on edge devices. This predictor can evolve to achieve higher latency prediction precision during the network compression process. Experimental results demonstrate that EvoLP outperforms previous state-of-the-art approaches by being evaluated on three edge devices and four model variants. Moreover, when incorporated into a model compression framework, it effectively guides the compression process for higher model accuracy while satisfying strict latency constraints. We open source EvoLP at https://github.com/ntuliuteam/EvoLP.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Huai, Shuo
Kong, Hao
Li, Shiqing
Luo, Xiangzhong
Subramaniam, Ravi
Makaya, Christian
Lin, Qian
Liu, Weichen
format Article
author Huai, Shuo
Kong, Hao
Li, Shiqing
Luo, Xiangzhong
Subramaniam, Ravi
Makaya, Christian
Lin, Qian
Liu, Weichen
author_sort Huai, Shuo
title EvoLP: self-evolving latency predictor for model compression in real-time edge systems
title_short EvoLP: self-evolving latency predictor for model compression in real-time edge systems
title_full EvoLP: self-evolving latency predictor for model compression in real-time edge systems
title_fullStr EvoLP: self-evolving latency predictor for model compression in real-time edge systems
title_full_unstemmed EvoLP: self-evolving latency predictor for model compression in real-time edge systems
title_sort evolp: self-evolving latency predictor for model compression in real-time edge systems
publishDate 2023
url https://hdl.handle.net/10356/171636
_version_ 1781793720946917376