FAT: an in-memory accelerator with fast addition for ternary weight neural networks
Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a uni...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/162483 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-162483 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1624832023-12-15T07:15:15Z FAT: an in-memory accelerator with fast addition for ternary weight neural networks Zhu, Shien Duong, Luan H. K. Chen, Hui Liu, Di Liu, Weichen School of Computer Science and Engineering Parallel and Distributed Computing Centre HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Hardware::Arithmetic and logic structures Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Ternary Wight Neural Network In-Memory Computing Convolutional Neural Network Spin-Transfer Torque Magnetic Random-Access Memory Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of activations and weights and increase the parallelism across memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00× speedup, 1.22× power efficiency and 1.22× area efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02× speedup and 12.19× energy efficiency compared with ParaPIM on networks with 80% average sparsity. Ministry of Education (MOE) Nanyang Technological University Submitted/Accepted version This work is partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071) and Tier 1 (MOE2019-T1-001-072), and partially supported by Nanyang Technological University, Singapore, under its NAP (M4082282) and SUG (M4082087). 2022-10-26T04:08:04Z 2022-10-26T04:08:04Z 2022 Journal Article Zhu, S., Duong, L. H. K., Chen, H., Liu, D. & Liu, W. (2022). FAT: an in-memory accelerator with fast addition for ternary weight neural networks. IEEE Transactions On Computer-Aided Design of Integrated Circuits and Systems. https://dx.doi.org/10.1109/TCAD.2022.3184276 0278-0070 https://hdl.handle.net/10356/162483 10.1109/TCAD.2022.3184276 2-s2.0-85132696672 en MOE2019-T2-1-071 MOE2019-T1-001-072 M4082282 M4082087 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 10.21979/N9/DYKUPV © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TCAD.2022.3184276. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Hardware::Arithmetic and logic structures Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Ternary Wight Neural Network In-Memory Computing Convolutional Neural Network Spin-Transfer Torque Magnetic Random-Access Memory |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Hardware::Arithmetic and logic structures Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Ternary Wight Neural Network In-Memory Computing Convolutional Neural Network Spin-Transfer Torque Magnetic Random-Access Memory Zhu, Shien Duong, Luan H. K. Chen, Hui Liu, Di Liu, Weichen FAT: an in-memory accelerator with fast addition for ternary weight neural networks |
description |
Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of activations and weights and increase the parallelism across memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00× speedup, 1.22× power efficiency and 1.22× area efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02× speedup and 12.19× energy efficiency compared with ParaPIM on networks with 80% average sparsity. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Zhu, Shien Duong, Luan H. K. Chen, Hui Liu, Di Liu, Weichen |
format |
Article |
author |
Zhu, Shien Duong, Luan H. K. Chen, Hui Liu, Di Liu, Weichen |
author_sort |
Zhu, Shien |
title |
FAT: an in-memory accelerator with fast addition for ternary weight neural networks |
title_short |
FAT: an in-memory accelerator with fast addition for ternary weight neural networks |
title_full |
FAT: an in-memory accelerator with fast addition for ternary weight neural networks |
title_fullStr |
FAT: an in-memory accelerator with fast addition for ternary weight neural networks |
title_full_unstemmed |
FAT: an in-memory accelerator with fast addition for ternary weight neural networks |
title_sort |
fat: an in-memory accelerator with fast addition for ternary weight neural networks |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/162483 |
_version_ |
1787136504708988928 |