Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration

Crossbar-based In-Memory Computing (IMC) accelerators preload the entire Deep Neural Network (DNN) into crossbars before inference. However, devices with limited crossbars cannot infer increasingly complex models. IMC-pruning can reduce the usage of crossbars, but current methods need expensive extr...

Full description

Saved in:
Bibliographic Details
Main Authors: Huai, Shuo, Liu, Di, Luo, Xiangzhong, Chen, Hui, Liu, Weichen, Subramaniam, Ravi
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165352
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Crossbar-based In-Memory Computing (IMC) accelerators preload the entire Deep Neural Network (DNN) into crossbars before inference. However, devices with limited crossbars cannot infer increasingly complex models. IMC-pruning can reduce the usage of crossbars, but current methods need expensive extra hardware for data alignment. Meanwhile, quantization can represent weights of DNNs by integers, but they employ non-integer scaling factors to ensure accuracy, requiring costly multipliers. In this paper, we first propose crossbar-aligned pruning to reduce the usage of crossbars without hardware overhead. Then, we introduce a quantization scheme to avoid multipliers in IMC devices. Finally, we design a learning method to complete above two schemes and cultivate an optimal compact DNN with high accuracy and large sparsity during training. Experiments demonstrate that our framework, compared to state-of-the-art methods, achieves larger sparsity and lower power consumption with higher accuracy. We even improve the accuracy by 0.43% for VGG-16 with an 88.25% sparsity rate on the Cifar-10 dataset. Compared to the original model, we reduce computing power and area by 19.8x and 18.8x, respectively.