CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation

Crossbar-based In-Memory Processing (IMP) accelerators have been widely adopted to achieve high-speed and low-power computing, especially for deep neural network (DNN) models with numerous weights and high computational complexity. However, the floating-point (FP) arithmetic is not compatible with c...

Full description

Saved in:

Bibliographic Details
Main Authors:	Huai, Shuo, Kong, Hao, Luo, Xiangzhong, Li, Shiqing, Subramaniam, Ravi, Makaya, Christian, Lin, Qian, Liu, Weichen
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering In-Memory Processing Pruning
Online Access:	https://hdl.handle.net/10356/171633
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-171633
record_format	dspace
spelling	sg-ntu-dr.10356-1716332023-11-03T15:36:37Z CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation Huai, Shuo Kong, Hao Luo, Xiangzhong Li, Shiqing Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen School of Computer Science and Engineering HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering In-Memory Processing Pruning Crossbar-based In-Memory Processing (IMP) accelerators have been widely adopted to achieve high-speed and low-power computing, especially for deep neural network (DNN) models with numerous weights and high computational complexity. However, the floating-point (FP) arithmetic is not compatible with crossbar architectures. Also, redundant weights of current DNN models occupy too many crossbars, limiting the efficiency of crossbar accelerators. Meanwhile, due to the inherent non-ideal behavior of crossbar devices, like write variations, pre-trained DNN models suffer from accuracy degradation when it is deployed on a crossbar-based IMP accelerator for inference. Although some approaches are proposed to address these issues, they often fail to consider the interaction among these issues, and introduce significant hardware overhead for solving each issue. To deploy complex models on IMP accelerators, we should compact the model and mitigate the influence of device non-ideal behaviors without introducing significant overhead from each technique.In this paper, we first propose to reuse bit-shift units in crossbars for approximately multiplying scaling factors in our quantization scheme to avoid using FP processors. Second, we propose to apply kernel-group pruning and crossbar pruning to eliminate the hardware units for data aligning. We also design a zerorize-recover training process for our pruning method to achieve higher accuracy. Third, we adopt the runtime-aware non-ideality adaptation with a self-compensation scheme to relieve the impact of non-ideality by exploiting the feature of crossbars. Finally, we integrate these three optimization procedures into one training process to form a comprehensive learning framework for co-optimization, which can achieve higher accuracy. The experimental results indicate that our comprehensive learning framework can obtain significant improvements over the original model when inferring on the crossbar-based IMP accelerator, with an average reduction of computing power and computing area by 100.02× and 17.37×, respectively. Furthermore, we can obtain totally integer-only, pruned, and reliable VGG-16 and ResNet-56 models for the Cifar-10 dataset on IMP accelerators, with accuracy drops of only 2.19% and 1.26%, respectively, without any hardware overhead. Nanyang Technological University Published version This work is partially supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAFICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner, HP Inc., through the HPNTU Digital Manufacturing Corporate Lab (I1801E0028), and partially supported by Nanyang Technological University, Singapore, under its NAP (M4082282/04INS000515C130). 2023-11-01T08:55:50Z 2023-11-01T08:55:50Z 2023 Journal Article Huai, S., Kong, H., Luo, X., Li, S., Subramaniam, R., Makaya, C., Lin, Q. & Liu, W. (2023). CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation. ACM Transactions On Embedded Computing Systems, 22(5s), 123-. https://dx.doi.org/10.1145/3609115 1539-9087 https://hdl.handle.net/10356/171633 10.1145/3609115 2-s2.0-85171805904 5s 22 123 en IAF-ICP I1801E0028 NAP (M4082282/04INS000515C130) ACM Transactions on Embedded Computing Systems © 2023 Copyright held by the owner/author(s). This is an open-access article distributed under the terms of the Creative Commons License. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering In-Memory Processing Pruning
spellingShingle	Engineering::Computer science and engineering In-Memory Processing Pruning Huai, Shuo Kong, Hao Luo, Xiangzhong Li, Shiqing Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation
description	Crossbar-based In-Memory Processing (IMP) accelerators have been widely adopted to achieve high-speed and low-power computing, especially for deep neural network (DNN) models with numerous weights and high computational complexity. However, the floating-point (FP) arithmetic is not compatible with crossbar architectures. Also, redundant weights of current DNN models occupy too many crossbars, limiting the efficiency of crossbar accelerators. Meanwhile, due to the inherent non-ideal behavior of crossbar devices, like write variations, pre-trained DNN models suffer from accuracy degradation when it is deployed on a crossbar-based IMP accelerator for inference. Although some approaches are proposed to address these issues, they often fail to consider the interaction among these issues, and introduce significant hardware overhead for solving each issue. To deploy complex models on IMP accelerators, we should compact the model and mitigate the influence of device non-ideal behaviors without introducing significant overhead from each technique.In this paper, we first propose to reuse bit-shift units in crossbars for approximately multiplying scaling factors in our quantization scheme to avoid using FP processors. Second, we propose to apply kernel-group pruning and crossbar pruning to eliminate the hardware units for data aligning. We also design a zerorize-recover training process for our pruning method to achieve higher accuracy. Third, we adopt the runtime-aware non-ideality adaptation with a self-compensation scheme to relieve the impact of non-ideality by exploiting the feature of crossbars. Finally, we integrate these three optimization procedures into one training process to form a comprehensive learning framework for co-optimization, which can achieve higher accuracy. The experimental results indicate that our comprehensive learning framework can obtain significant improvements over the original model when inferring on the crossbar-based IMP accelerator, with an average reduction of computing power and computing area by 100.02× and 17.37×, respectively. Furthermore, we can obtain totally integer-only, pruned, and reliable VGG-16 and ResNet-56 models for the Cifar-10 dataset on IMP accelerators, with accuracy drops of only 2.19% and 1.26%, respectively, without any hardware overhead.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Huai, Shuo Kong, Hao Luo, Xiangzhong Li, Shiqing Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen
format	Article
author	Huai, Shuo Kong, Hao Luo, Xiangzhong Li, Shiqing Subramaniam, Ravi Makaya, Christian Lin, Qian Liu, Weichen
author_sort	Huai, Shuo
title	CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation
title_short	CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation
title_full	CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation
title_fullStr	CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation
title_full_unstemmed	CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation
title_sort	crimp: compact & reliable dnn inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation
publishDate	2023
url	https://hdl.handle.net/10356/171633
_version_	1781793888109854720

CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation

Similar Items