Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration

Crossbar-based In-Memory Computing (IMC) accelerators preload the entire Deep Neural Network (DNN) into crossbars before inference. However, devices with limited crossbars cannot infer increasingly complex models. IMC-pruning can reduce the usage of crossbars, but current methods need expensive extr...

Full description

Saved in:

Bibliographic Details
Main Authors:	Huai, Shuo, Liu, Di, Luo, Xiangzhong, Chen, Hui, Liu, Weichen, Subramaniam, Ravi
Other Authors:	School of Computer Science and Engineering
Format:	Conference or Workshop Item
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies In-Memory Computing Pruning Quantization Neural Networks
Online Access:	https://hdl.handle.net/10356/165352
Tags:	Add Tag No Tags, Be the first to tag this record!

id	sg-ntu-dr.10356-165352
record_format	dspace
spelling	sg-ntu-dr.10356-1653522023-03-30T04:22:34Z Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration Huai, Shuo Liu, Di Luo, Xiangzhong Chen, Hui Liu, Weichen Subramaniam, Ravi School of Computer Science and Engineering 28th Asia and South Pacific Design Automation Conference (ASP-DAC 2023) HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering::Computing methodologies In-Memory Computing Pruning Quantization Neural Networks Crossbar-based In-Memory Computing (IMC) accelerators preload the entire Deep Neural Network (DNN) into crossbars before inference. However, devices with limited crossbars cannot infer increasingly complex models. IMC-pruning can reduce the usage of crossbars, but current methods need expensive extra hardware for data alignment. Meanwhile, quantization can represent weights of DNNs by integers, but they employ non-integer scaling factors to ensure accuracy, requiring costly multipliers. In this paper, we first propose crossbar-aligned pruning to reduce the usage of crossbars without hardware overhead. Then, we introduce a quantization scheme to avoid multipliers in IMC devices. Finally, we design a learning method to complete above two schemes and cultivate an optimal compact DNN with high accuracy and large sparsity during training. Experiments demonstrate that our framework, compared to state-of-the-art methods, achieves larger sparsity and lower power consumption with higher accuracy. We even improve the accuracy by 0.43% for VGG-16 with an 88.25% sparsity rate on the Cifar-10 dataset. Compared to the original model, we reduce computing power and area by 19.8x and 18.8x, respectively. Ministry of Education (MOE) Nanyang Technological University This study is supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner, HP Inc., through the HP-NTU Digital Manufacturing Corporate Lab (I1801E0028). This work is also partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071) and Tier 1 (MOE2019-T1-001-072), and partially supported by Nanyang Technological University, Singapore, under its NAP (M4082282). 2023-03-29T00:13:34Z 2023-03-29T00:13:34Z 2023 Conference Paper Huai, S., Liu, D., Luo, X., Chen, H., Liu, W. & Subramaniam, R. (2023). Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration. 28th Asia and South Pacific Design Automation Conference (ASP-DAC 2023), 234-239. https://dx.doi.org/10.1145/3566097.3567856 978-1-4503-9783-4 https://hdl.handle.net/10356/165352 10.1145/3566097.3567856 234 239 en I1801E0028 MOE2019-T2-1-071 MOE2019-T1-001-072 M4082282 10.21979/N9/OVGZZ1 © 2023 Association for Computing Machinery. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies In-Memory Computing Pruning Quantization Neural Networks
spellingShingle	Engineering::Computer science and engineering::Computing methodologies In-Memory Computing Pruning Quantization Neural Networks Huai, Shuo Liu, Di Luo, Xiangzhong Chen, Hui Liu, Weichen Subramaniam, Ravi Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
description	Crossbar-based In-Memory Computing (IMC) accelerators preload the entire Deep Neural Network (DNN) into crossbars before inference. However, devices with limited crossbars cannot infer increasingly complex models. IMC-pruning can reduce the usage of crossbars, but current methods need expensive extra hardware for data alignment. Meanwhile, quantization can represent weights of DNNs by integers, but they employ non-integer scaling factors to ensure accuracy, requiring costly multipliers. In this paper, we first propose crossbar-aligned pruning to reduce the usage of crossbars without hardware overhead. Then, we introduce a quantization scheme to avoid multipliers in IMC devices. Finally, we design a learning method to complete above two schemes and cultivate an optimal compact DNN with high accuracy and large sparsity during training. Experiments demonstrate that our framework, compared to state-of-the-art methods, achieves larger sparsity and lower power consumption with higher accuracy. We even improve the accuracy by 0.43% for VGG-16 with an 88.25% sparsity rate on the Cifar-10 dataset. Compared to the original model, we reduce computing power and area by 19.8x and 18.8x, respectively.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Huai, Shuo Liu, Di Luo, Xiangzhong Chen, Hui Liu, Weichen Subramaniam, Ravi
format	Conference or Workshop Item
author	Huai, Shuo Liu, Di Luo, Xiangzhong Chen, Hui Liu, Weichen Subramaniam, Ravi
author_sort	Huai, Shuo
title	Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
title_short	Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
title_full	Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
title_fullStr	Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
title_full_unstemmed	Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
title_sort	crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration
publishDate	2023
url	https://hdl.handle.net/10356/165352
_version_	1762031107287547904

Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration

Similar Items