CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation
Crossbar-based In-Memory Processing (IMP) accelerators have been widely adopted to achieve high-speed and low-power computing, especially for deep neural network (DNN) models with numerous weights and high computational complexity. However, the floating-point (FP) arithmetic is not compatible with c...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/171633 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-171633 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1716332023-11-03T15:36:37Z CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation Huai, Shuo Kong, Hao Luo, Xiangzhong Li, Shiqing Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen School of Computer Science and Engineering HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering In-Memory Processing Pruning Crossbar-based In-Memory Processing (IMP) accelerators have been widely adopted to achieve high-speed and low-power computing, especially for deep neural network (DNN) models with numerous weights and high computational complexity. However, the floating-point (FP) arithmetic is not compatible with crossbar architectures. Also, redundant weights of current DNN models occupy too many crossbars, limiting the efficiency of crossbar accelerators. Meanwhile, due to the inherent non-ideal behavior of crossbar devices, like write variations, pre-trained DNN models suffer from accuracy degradation when it is deployed on a crossbar-based IMP accelerator for inference. Although some approaches are proposed to address these issues, they often fail to consider the interaction among these issues, and introduce significant hardware overhead for solving each issue. To deploy complex models on IMP accelerators, we should compact the model and mitigate the influence of device non-ideal behaviors without introducing significant overhead from each technique.In this paper, we first propose to reuse bit-shift units in crossbars for approximately multiplying scaling factors in our quantization scheme to avoid using FP processors. Second, we propose to apply kernel-group pruning and crossbar pruning to eliminate the hardware units for data aligning. We also design a zerorize-recover training process for our pruning method to achieve higher accuracy. Third, we adopt the runtime-aware non-ideality adaptation with a self-compensation scheme to relieve the impact of non-ideality by exploiting the feature of crossbars. Finally, we integrate these three optimization procedures into one training process to form a comprehensive learning framework for co-optimization, which can achieve higher accuracy. The experimental results indicate that our comprehensive learning framework can obtain significant improvements over the original model when inferring on the crossbar-based IMP accelerator, with an average reduction of computing power and computing area by 100.02× and 17.37×, respectively. Furthermore, we can obtain totally integer-only, pruned, and reliable VGG-16 and ResNet-56 models for the Cifar-10 dataset on IMP accelerators, with accuracy drops of only 2.19% and 1.26%, respectively, without any hardware overhead. Nanyang Technological University Published version This work is partially supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAFICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner, HP Inc., through the HPNTU Digital Manufacturing Corporate Lab (I1801E0028), and partially supported by Nanyang Technological University, Singapore, under its NAP (M4082282/04INS000515C130). 2023-11-01T08:55:50Z 2023-11-01T08:55:50Z 2023 Journal Article Huai, S., Kong, H., Luo, X., Li, S., Subramaniam, R., Makaya, C., Lin, Q. & Liu, W. (2023). CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation. ACM Transactions On Embedded Computing Systems, 22(5s), 123-. https://dx.doi.org/10.1145/3609115 1539-9087 https://hdl.handle.net/10356/171633 10.1145/3609115 2-s2.0-85171805904 5s 22 123 en IAF-ICP I1801E0028 NAP (M4082282/04INS000515C130) ACM Transactions on Embedded Computing Systems © 2023 Copyright held by the owner/author(s). This is an open-access article distributed under the terms of the Creative Commons License. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering In-Memory Processing Pruning |
spellingShingle |
Engineering::Computer science and engineering In-Memory Processing Pruning Huai, Shuo Kong, Hao Luo, Xiangzhong Li, Shiqing Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation |
description |
Crossbar-based In-Memory Processing (IMP) accelerators have been widely adopted to achieve high-speed and low-power computing, especially for deep neural network (DNN) models with numerous weights and high computational complexity. However, the floating-point (FP) arithmetic is not compatible with crossbar architectures. Also, redundant weights of current DNN models occupy too many crossbars, limiting the efficiency of crossbar accelerators. Meanwhile, due to the inherent non-ideal behavior of crossbar devices, like write variations, pre-trained DNN models suffer from accuracy degradation when it is deployed on a crossbar-based IMP accelerator for inference. Although some approaches are proposed to address these issues, they often fail to consider the interaction among these issues, and introduce significant hardware overhead for solving each issue. To deploy complex models on IMP accelerators, we should compact the model and mitigate the influence of device non-ideal behaviors without introducing significant overhead from each technique.In this paper, we first propose to reuse bit-shift units in crossbars for approximately multiplying scaling factors in our quantization scheme to avoid using FP processors. Second, we propose to apply kernel-group pruning and crossbar pruning to eliminate the hardware units for data aligning. We also design a zerorize-recover training process for our pruning method to achieve higher accuracy. Third, we adopt the runtime-aware non-ideality adaptation with a self-compensation scheme to relieve the impact of non-ideality by exploiting the feature of crossbars. Finally, we integrate these three optimization procedures into one training process to form a comprehensive learning framework for co-optimization, which can achieve higher accuracy. The experimental results indicate that our comprehensive learning framework can obtain significant improvements over the original model when inferring on the crossbar-based IMP accelerator, with an average reduction of computing power and computing area by 100.02× and 17.37×, respectively. Furthermore, we can obtain totally integer-only, pruned, and reliable VGG-16 and ResNet-56 models for the Cifar-10 dataset on IMP accelerators, with accuracy drops of only 2.19% and 1.26%, respectively, without any hardware overhead. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Huai, Shuo Kong, Hao Luo, Xiangzhong Li, Shiqing Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen |
format |
Article |
author |
Huai, Shuo Kong, Hao Luo, Xiangzhong Li, Shiqing Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen |
author_sort |
Huai, Shuo |
title |
CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation |
title_short |
CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation |
title_full |
CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation |
title_fullStr |
CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation |
title_full_unstemmed |
CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation |
title_sort |
crimp: compact & reliable dnn inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/171633 |
_version_ |
1781793888109854720 |