Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration

Crossbar-based In-Memory Computing (IMC) accelerators preload the entire Deep Neural Network (DNN) into crossbars before inference. However, devices with limited crossbars cannot infer increasingly complex models. IMC-pruning can reduce the usage of crossbars, but current methods need expensive extr...

Full description

Saved in:
Bibliographic Details
Main Authors: Huai, Shuo, Liu, Di, Luo, Xiangzhong, Chen, Hui, Liu, Weichen, Subramaniam, Ravi
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165352
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-165352
record_format dspace
spelling sg-ntu-dr.10356-1653522023-03-30T04:22:34Z Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration Huai, Shuo Liu, Di Luo, Xiangzhong Chen, Hui Liu, Weichen Subramaniam, Ravi School of Computer Science and Engineering 28th Asia and South Pacific Design Automation Conference (ASP-DAC 2023) HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering::Computing methodologies In-Memory Computing Pruning Quantization Neural Networks Crossbar-based In-Memory Computing (IMC) accelerators preload the entire Deep Neural Network (DNN) into crossbars before inference. However, devices with limited crossbars cannot infer increasingly complex models. IMC-pruning can reduce the usage of crossbars, but current methods need expensive extra hardware for data alignment. Meanwhile, quantization can represent weights of DNNs by integers, but they employ non-integer scaling factors to ensure accuracy, requiring costly multipliers. In this paper, we first propose crossbar-aligned pruning to reduce the usage of crossbars without hardware overhead. Then, we introduce a quantization scheme to avoid multipliers in IMC devices. Finally, we design a learning method to complete above two schemes and cultivate an optimal compact DNN with high accuracy and large sparsity during training. Experiments demonstrate that our framework, compared to state-of-the-art methods, achieves larger sparsity and lower power consumption with higher accuracy. We even improve the accuracy by 0.43% for VGG-16 with an 88.25% sparsity rate on the Cifar-10 dataset. Compared to the original model, we reduce computing power and area by 19.8x and 18.8x, respectively. Ministry of Education (MOE) Nanyang Technological University This study is supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner, HP Inc., through the HP-NTU Digital Manufacturing Corporate Lab (I1801E0028). This work is also partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071) and Tier 1 (MOE2019-T1-001-072), and partially supported by Nanyang Technological University, Singapore, under its NAP (M4082282). 2023-03-29T00:13:34Z 2023-03-29T00:13:34Z 2023 Conference Paper Huai, S., Liu, D., Luo, X., Chen, H., Liu, W. & Subramaniam, R. (2023). Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration. 28th Asia and South Pacific Design Automation Conference (ASP-DAC 2023), 234-239. https://dx.doi.org/10.1145/3566097.3567856 978-1-4503-9783-4 https://hdl.handle.net/10356/165352 10.1145/3566097.3567856 234 239 en I1801E0028 MOE2019-T2-1-071 MOE2019-T1-001-072 M4082282 10.21979/N9/OVGZZ1 © 2023 Association for Computing Machinery. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies
In-Memory Computing
Pruning
Quantization
Neural Networks
spellingShingle Engineering::Computer science and engineering::Computing methodologies
In-Memory Computing
Pruning
Quantization
Neural Networks
Huai, Shuo
Liu, Di
Luo, Xiangzhong
Chen, Hui
Liu, Weichen
Subramaniam, Ravi
Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
description Crossbar-based In-Memory Computing (IMC) accelerators preload the entire Deep Neural Network (DNN) into crossbars before inference. However, devices with limited crossbars cannot infer increasingly complex models. IMC-pruning can reduce the usage of crossbars, but current methods need expensive extra hardware for data alignment. Meanwhile, quantization can represent weights of DNNs by integers, but they employ non-integer scaling factors to ensure accuracy, requiring costly multipliers. In this paper, we first propose crossbar-aligned pruning to reduce the usage of crossbars without hardware overhead. Then, we introduce a quantization scheme to avoid multipliers in IMC devices. Finally, we design a learning method to complete above two schemes and cultivate an optimal compact DNN with high accuracy and large sparsity during training. Experiments demonstrate that our framework, compared to state-of-the-art methods, achieves larger sparsity and lower power consumption with higher accuracy. We even improve the accuracy by 0.43% for VGG-16 with an 88.25% sparsity rate on the Cifar-10 dataset. Compared to the original model, we reduce computing power and area by 19.8x and 18.8x, respectively.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Huai, Shuo
Liu, Di
Luo, Xiangzhong
Chen, Hui
Liu, Weichen
Subramaniam, Ravi
format Conference or Workshop Item
author Huai, Shuo
Liu, Di
Luo, Xiangzhong
Chen, Hui
Liu, Weichen
Subramaniam, Ravi
author_sort Huai, Shuo
title Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
title_short Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
title_full Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
title_fullStr Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
title_full_unstemmed Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
title_sort crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
publishDate 2023
url https://hdl.handle.net/10356/165352
_version_ 1762031107287547904