FAT: an in-memory accelerator with fast addition for ternary weight neural networks

Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a uni...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhu, Shien, Duong, Luan H. K., Chen, Hui, Liu, Di, Liu, Weichen
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/162483
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-162483
record_format dspace
spelling sg-ntu-dr.10356-1624832023-12-15T07:15:15Z FAT: an in-memory accelerator with fast addition for ternary weight neural networks Zhu, Shien Duong, Luan H. K. Chen, Hui Liu, Di Liu, Weichen School of Computer Science and Engineering Parallel and Distributed Computing Centre HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Hardware::Arithmetic and logic structures Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Ternary Wight Neural Network In-Memory Computing Convolutional Neural Network Spin-Transfer Torque Magnetic Random-Access Memory Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of activations and weights and increase the parallelism across memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00× speedup, 1.22× power efficiency and 1.22× area efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02× speedup and 12.19× energy efficiency compared with ParaPIM on networks with 80% average sparsity. Ministry of Education (MOE) Nanyang Technological University Submitted/Accepted version This work is partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071) and Tier 1 (MOE2019-T1-001-072), and partially supported by Nanyang Technological University, Singapore, under its NAP (M4082282) and SUG (M4082087). 2022-10-26T04:08:04Z 2022-10-26T04:08:04Z 2022 Journal Article Zhu, S., Duong, L. H. K., Chen, H., Liu, D. & Liu, W. (2022). FAT: an in-memory accelerator with fast addition for ternary weight neural networks. IEEE Transactions On Computer-Aided Design of Integrated Circuits and Systems. https://dx.doi.org/10.1109/TCAD.2022.3184276 0278-0070 https://hdl.handle.net/10356/162483 10.1109/TCAD.2022.3184276 2-s2.0-85132696672 en MOE2019-T2-1-071 MOE2019-T1-001-072 M4082282 M4082087 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 10.21979/N9/DYKUPV © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TCAD.2022.3184276. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Hardware::Arithmetic and logic structures
Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems
Ternary Wight Neural Network
In-Memory Computing
Convolutional Neural Network
Spin-Transfer Torque Magnetic Random-Access Memory
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Hardware::Arithmetic and logic structures
Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems
Ternary Wight Neural Network
In-Memory Computing
Convolutional Neural Network
Spin-Transfer Torque Magnetic Random-Access Memory
Zhu, Shien
Duong, Luan H. K.
Chen, Hui
Liu, Di
Liu, Weichen
FAT: an in-memory accelerator with fast addition for ternary weight neural networks
description Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of activations and weights and increase the parallelism across memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00× speedup, 1.22× power efficiency and 1.22× area efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02× speedup and 12.19× energy efficiency compared with ParaPIM on networks with 80% average sparsity.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Zhu, Shien
Duong, Luan H. K.
Chen, Hui
Liu, Di
Liu, Weichen
format Article
author Zhu, Shien
Duong, Luan H. K.
Chen, Hui
Liu, Di
Liu, Weichen
author_sort Zhu, Shien
title FAT: an in-memory accelerator with fast addition for ternary weight neural networks
title_short FAT: an in-memory accelerator with fast addition for ternary weight neural networks
title_full FAT: an in-memory accelerator with fast addition for ternary weight neural networks
title_fullStr FAT: an in-memory accelerator with fast addition for ternary weight neural networks
title_full_unstemmed FAT: an in-memory accelerator with fast addition for ternary weight neural networks
title_sort fat: an in-memory accelerator with fast addition for ternary weight neural networks
publishDate 2022
url https://hdl.handle.net/10356/162483
_version_ 1787136504708988928