FAT: an in-memory accelerator with fast addition for ternary weight neural networks

Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a uni...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Zhu, Shien, Duong, Luan H. K., Chen, Hui, Liu, Di, Liu, Weichen
مؤلفون آخرون:	School of Computer Science and Engineering
التنسيق:	مقال
اللغة:	English
منشور في:	2022
الموضوعات:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Hardware::Arithmetic and logic structures Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Ternary Wight Neural Network In-Memory Computing Convolutional Neural Network Spin-Transfer Torque Magnetic Random-Access Memory
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/162483
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	sg-ntu-dr.10356-162483
record_format	dspace
spelling	sg-ntu-dr.10356-1624832023-12-15T07:15:15Z FAT: an in-memory accelerator with fast addition for ternary weight neural networks Zhu, Shien Duong, Luan H. K. Chen, Hui Liu, Di Liu, Weichen School of Computer Science and Engineering Parallel and Distributed Computing Centre HP-NTU Digital Manufacturing Corporate Lab Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Hardware::Arithmetic and logic structures Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Ternary Wight Neural Network In-Memory Computing Convolutional Neural Network Spin-Transfer Torque Magnetic Random-Access Memory Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of activations and weights and increase the parallelism across memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00× speedup, 1.22× power efficiency and 1.22× area efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02× speedup and 12.19× energy efficiency compared with ParaPIM on networks with 80% average sparsity. Ministry of Education (MOE) Nanyang Technological University Submitted/Accepted version This work is partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071) and Tier 1 (MOE2019-T1-001-072), and partially supported by Nanyang Technological University, Singapore, under its NAP (M4082282) and SUG (M4082087). 2022-10-26T04:08:04Z 2022-10-26T04:08:04Z 2022 Journal Article Zhu, S., Duong, L. H. K., Chen, H., Liu, D. & Liu, W. (2022). FAT: an in-memory accelerator with fast addition for ternary weight neural networks. IEEE Transactions On Computer-Aided Design of Integrated Circuits and Systems. https://dx.doi.org/10.1109/TCAD.2022.3184276 0278-0070 https://hdl.handle.net/10356/162483 10.1109/TCAD.2022.3184276 2-s2.0-85132696672 en MOE2019-T2-1-071 MOE2019-T1-001-072 M4082282 M4082087 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 10.21979/N9/DYKUPV © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TCAD.2022.3184276. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Hardware::Arithmetic and logic structures Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Ternary Wight Neural Network In-Memory Computing Convolutional Neural Network Spin-Transfer Torque Magnetic Random-Access Memory
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Hardware::Arithmetic and logic structures Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Ternary Wight Neural Network In-Memory Computing Convolutional Neural Network Spin-Transfer Torque Magnetic Random-Access Memory Zhu, Shien Duong, Luan H. K. Chen, Hui Liu, Di Liu, Weichen FAT: an in-memory accelerator with fast addition for ternary weight neural networks
description	Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of activations and weights and increase the parallelism across memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00× speedup, 1.22× power efficiency and 1.22× area efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02× speedup and 12.19× energy efficiency compared with ParaPIM on networks with 80% average sparsity.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Zhu, Shien Duong, Luan H. K. Chen, Hui Liu, Di Liu, Weichen
format	Article
author	Zhu, Shien Duong, Luan H. K. Chen, Hui Liu, Di Liu, Weichen
author_sort	Zhu, Shien
title	FAT: an in-memory accelerator with fast addition for ternary weight neural networks
title_short	FAT: an in-memory accelerator with fast addition for ternary weight neural networks
title_full	FAT: an in-memory accelerator with fast addition for ternary weight neural networks
title_fullStr	FAT: an in-memory accelerator with fast addition for ternary weight neural networks
title_full_unstemmed	FAT: an in-memory accelerator with fast addition for ternary weight neural networks
title_sort	fat: an in-memory accelerator with fast addition for ternary weight neural networks
publishDate	2022
url	https://hdl.handle.net/10356/162483
_version_	1787136504708988928

FAT: an in-memory accelerator with fast addition for ternary weight neural networks

مواد مشابهة