Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
Crossbar-based In-Memory Computing (IMC) accelerators preload the entire Deep Neural Network (DNN) into crossbars before inference. However, devices with limited crossbars cannot infer increasingly complex models. IMC-pruning can reduce the usage of crossbars, but current methods need expensive extr...
Saved in:
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/165352 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-165352 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1653522023-03-30T04:22:34Z Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration Huai, Shuo Liu, Di Luo, Xiangzhong Chen, Hui Liu, Weichen Subramaniam, Ravi School of Computer Science and Engineering 28th Asia and South Pacific Design Automation Conference (ASP-DAC 2023) HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering::Computing methodologies In-Memory Computing Pruning Quantization Neural Networks Crossbar-based In-Memory Computing (IMC) accelerators preload the entire Deep Neural Network (DNN) into crossbars before inference. However, devices with limited crossbars cannot infer increasingly complex models. IMC-pruning can reduce the usage of crossbars, but current methods need expensive extra hardware for data alignment. Meanwhile, quantization can represent weights of DNNs by integers, but they employ non-integer scaling factors to ensure accuracy, requiring costly multipliers. In this paper, we first propose crossbar-aligned pruning to reduce the usage of crossbars without hardware overhead. Then, we introduce a quantization scheme to avoid multipliers in IMC devices. Finally, we design a learning method to complete above two schemes and cultivate an optimal compact DNN with high accuracy and large sparsity during training. Experiments demonstrate that our framework, compared to state-of-the-art methods, achieves larger sparsity and lower power consumption with higher accuracy. We even improve the accuracy by 0.43% for VGG-16 with an 88.25% sparsity rate on the Cifar-10 dataset. Compared to the original model, we reduce computing power and area by 19.8x and 18.8x, respectively. Ministry of Education (MOE) Nanyang Technological University This study is supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner, HP Inc., through the HP-NTU Digital Manufacturing Corporate Lab (I1801E0028). This work is also partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071) and Tier 1 (MOE2019-T1-001-072), and partially supported by Nanyang Technological University, Singapore, under its NAP (M4082282). 2023-03-29T00:13:34Z 2023-03-29T00:13:34Z 2023 Conference Paper Huai, S., Liu, D., Luo, X., Chen, H., Liu, W. & Subramaniam, R. (2023). Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration. 28th Asia and South Pacific Design Automation Conference (ASP-DAC 2023), 234-239. https://dx.doi.org/10.1145/3566097.3567856 978-1-4503-9783-4 https://hdl.handle.net/10356/165352 10.1145/3566097.3567856 234 239 en I1801E0028 MOE2019-T2-1-071 MOE2019-T1-001-072 M4082282 10.21979/N9/OVGZZ1 © 2023 Association for Computing Machinery. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies In-Memory Computing Pruning Quantization Neural Networks |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies In-Memory Computing Pruning Quantization Neural Networks Huai, Shuo Liu, Di Luo, Xiangzhong Chen, Hui Liu, Weichen Subramaniam, Ravi Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration |
description |
Crossbar-based In-Memory Computing (IMC) accelerators preload the entire Deep Neural Network (DNN) into crossbars before inference. However, devices with limited crossbars cannot infer increasingly complex models. IMC-pruning can reduce the usage of crossbars, but current methods need expensive extra hardware for data alignment. Meanwhile, quantization can represent weights of DNNs by integers, but they employ non-integer scaling factors to ensure accuracy, requiring costly multipliers. In this paper, we first propose crossbar-aligned pruning to reduce the usage of crossbars without hardware overhead. Then, we introduce a quantization scheme to avoid multipliers in IMC devices. Finally, we design a learning method to complete above two schemes and cultivate an optimal compact DNN with high accuracy and large sparsity during training. Experiments demonstrate that our framework, compared to state-of-the-art methods, achieves larger sparsity and lower power consumption with higher accuracy. We even improve the accuracy by 0.43% for VGG-16 with an 88.25% sparsity rate on the Cifar-10 dataset. Compared to the original model, we reduce computing power and area by 19.8x and 18.8x, respectively. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Huai, Shuo Liu, Di Luo, Xiangzhong Chen, Hui Liu, Weichen Subramaniam, Ravi |
format |
Conference or Workshop Item |
author |
Huai, Shuo Liu, Di Luo, Xiangzhong Chen, Hui Liu, Weichen Subramaniam, Ravi |
author_sort |
Huai, Shuo |
title |
Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration |
title_short |
Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration |
title_full |
Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration |
title_fullStr |
Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration |
title_full_unstemmed |
Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration |
title_sort |
crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/165352 |
_version_ |
1762031107287547904 |