EdgeCompress: coupling multi-dimensional model compression and dynamic inference for EdgeAI

Convolutional neural networks (CNNs) have demonstrated encouraging results in image classification tasks. However, the prohibitive computational cost of CNNs hinders the deployment of CNNs onto resource-constrained embedded devices. To address this issue, we propose, a comprehensive compression fram...

Full description

Saved in:
Bibliographic Details
Main Authors: Kong, Hao, Liu, Di, Huai, Shuo, Luo, Xiangzhong, Subramaniam, Ravi, Makaya, Christian, Lin, Qian, Liu, Weichen
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171623
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-171623
record_format dspace
spelling sg-ntu-dr.10356-1716232023-12-15T03:20:53Z EdgeCompress: coupling multi-dimensional model compression and dynamic inference for EdgeAI Kong, Hao Liu, Di Huai, Shuo Luo, Xiangzhong Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen School of Computer Science and Engineering HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering Embedded Systems Neural Network Compression Convolutional neural networks (CNNs) have demonstrated encouraging results in image classification tasks. However, the prohibitive computational cost of CNNs hinders the deployment of CNNs onto resource-constrained embedded devices. To address this issue, we propose, a comprehensive compression framework to reduce the computational overhead of CNNs. In, we first introduce dynamic image cropping, where we design a lightweight foreground predictor to accurately crop the most informative foreground object of input images for inference, which avoids redundant computation on background regions. Subsequently, we present compound shrinking to collaboratively compress the three dimensions (depth, width, and resolution) of CNNs according to their contribution to accuracy and model computation. Dynamic image cropping and compound shrinking together constitute a multi-dimensional CNN compression framework, which is able to comprehensively reduce the computational redundancy in both input images and neural network architectures, thereby improving the inference efficiency of CNNs. Further, we present a dynamic inference framework to efficiently process input images with different recognition difficulties, where we cascade multiple models with different complexities from our compression framework and dynamically adopt different models for different input images, which further compresses the computational redundancy and improves the inference efficiency of CNNs, facilitating the deployment of advanced CNNs onto embedded hardware. Experiments on ImageNet-1K demonstrate that reduces the computation of ResNet-50 by 48.8% while improving the top-1 accuracy by 0.8%. Meanwhile, we improve the accuracy by 4.1% with similar computation compared to HRank. the state-of-the-art compression framework. The source code and models are available at Ministry of Education (MOE) Nanyang Technological University Submitted/Accepted version This study is partially supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner, HP Inc., through the HP-NTU Digital Manufacturing Corporate Lab (I1801E0028). This work is also partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019- T2-1-071), and Nanyang Technological University, Singapore, under its NAP (M4082282). 2023-11-01T07:20:46Z 2023-11-01T07:20:46Z 2023 Journal Article Kong, H., Liu, D., Huai, S., Luo, X., Subramaniam, R., Makaya, C., Lin, Q. & Liu, W. (2023). EdgeCompress: coupling multi-dimensional model compression and dynamic inference for EdgeAI. IEEE Transactions On Computer-Aided Design of Integrated Circuits and Systems. https://dx.doi.org/10.1109/TCAD.2023.3276938 0278-0070 https://hdl.handle.net/10356/171623 10.1109/TCAD.2023.3276938 2-s2.0-85162902626 en IAF-ICP I1801E0028 MOE2019- T2-1-071 NAP (M4082282) IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 10.21979/N9/GCAMZH © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/TCAD.2023.3276938. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Embedded Systems
Neural Network Compression
spellingShingle Engineering::Computer science and engineering
Embedded Systems
Neural Network Compression
Kong, Hao
Liu, Di
Huai, Shuo
Luo, Xiangzhong
Subramaniam, Ravi
Makaya, Christian
Lin, Qian
Liu, Weichen
EdgeCompress: coupling multi-dimensional model compression and dynamic inference for EdgeAI
description Convolutional neural networks (CNNs) have demonstrated encouraging results in image classification tasks. However, the prohibitive computational cost of CNNs hinders the deployment of CNNs onto resource-constrained embedded devices. To address this issue, we propose, a comprehensive compression framework to reduce the computational overhead of CNNs. In, we first introduce dynamic image cropping, where we design a lightweight foreground predictor to accurately crop the most informative foreground object of input images for inference, which avoids redundant computation on background regions. Subsequently, we present compound shrinking to collaboratively compress the three dimensions (depth, width, and resolution) of CNNs according to their contribution to accuracy and model computation. Dynamic image cropping and compound shrinking together constitute a multi-dimensional CNN compression framework, which is able to comprehensively reduce the computational redundancy in both input images and neural network architectures, thereby improving the inference efficiency of CNNs. Further, we present a dynamic inference framework to efficiently process input images with different recognition difficulties, where we cascade multiple models with different complexities from our compression framework and dynamically adopt different models for different input images, which further compresses the computational redundancy and improves the inference efficiency of CNNs, facilitating the deployment of advanced CNNs onto embedded hardware. Experiments on ImageNet-1K demonstrate that reduces the computation of ResNet-50 by 48.8% while improving the top-1 accuracy by 0.8%. Meanwhile, we improve the accuracy by 4.1% with similar computation compared to HRank. the state-of-the-art compression framework. The source code and models are available at
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Kong, Hao
Liu, Di
Huai, Shuo
Luo, Xiangzhong
Subramaniam, Ravi
Makaya, Christian
Lin, Qian
Liu, Weichen
format Article
author Kong, Hao
Liu, Di
Huai, Shuo
Luo, Xiangzhong
Subramaniam, Ravi
Makaya, Christian
Lin, Qian
Liu, Weichen
author_sort Kong, Hao
title EdgeCompress: coupling multi-dimensional model compression and dynamic inference for EdgeAI
title_short EdgeCompress: coupling multi-dimensional model compression and dynamic inference for EdgeAI
title_full EdgeCompress: coupling multi-dimensional model compression and dynamic inference for EdgeAI
title_fullStr EdgeCompress: coupling multi-dimensional model compression and dynamic inference for EdgeAI
title_full_unstemmed EdgeCompress: coupling multi-dimensional model compression and dynamic inference for EdgeAI
title_sort edgecompress: coupling multi-dimensional model compression and dynamic inference for edgeai
publishDate 2023
url https://hdl.handle.net/10356/171623
_version_ 1787136556246499328