SRAM-based compute-in-memory macros for artificial intelligence applications
With the booming of artificial intelligence technology, processing of intensive data in the traditional von Neumann hardware faces numerous challenges, such as power-hungry computing and unsatisfactory processing latency. However, for edge devices, especially battery-based ones, low power consumptio...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/173157 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-173157 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1731572024-02-01T09:53:44Z SRAM-based compute-in-memory macros for artificial intelligence applications Zhang, Xin Kim Tae Hyoung School of Electrical and Electronic Engineering THKIM@ntu.edu.sg Engineering::Electrical and electronic engineering::Integrated circuits With the booming of artificial intelligence technology, processing of intensive data in the traditional von Neumann hardware faces numerous challenges, such as power-hungry computing and unsatisfactory processing latency. However, for edge devices, especially battery-based ones, low power consumption is a critical and high-priority requirement. It is well-known that the “memory wall” between the computation and storage units requires frequent data transmission, which leads to considerable power consumption and longer processing latency. To break down the “memory wall”, compute-in-memory (CIM) is proposed as an attractive and promising method, where the storage and computation functions are both accomplished in the bit-cell array. Therefore, with the implementation of the CIM approach, energy efficiency is improved tremendously with minimum memory access. This thesis mainly explores the SRAM-based compute-in- memory macros for artificial intelligence applications to achieve higher energy efficiency and shorter processing delay. As the first example, an analog-based transposable CIM macro is proposed to accelerate both inference and training stages in the convolutional neural networks. Compared with the previous transposable works which accomplish the two-way processing with the shared unit, the proposed local transpose bit- cell can achieve two-direction data propagation and improve the array utilization rate to 100%. Then, aiming to eliminate the intrinsic limitations of analog-based CIM design, such as limited ADC precision and PVT variations, a digital-based CIM macro is introduced to achieve 400 MHz full-precision CIM processing. The no- accuracy loss processing and high energy efficiency are achieved within the proposed 64Kb CIM architecture because of the full-digital circuits. Finally, a full-digital versatile CIM macro is presented for accelerating various types of ML algorithms. The input and weight precision are reconfigurable from 1-bit to 16-bit. With the weight-stationary addition, operands-stationary addition, and bit-seral multiplication achieved in the CIM macro, self-organizing maps, and convolutional neural networks can be accelerated by the proposed architecture. Doctor of Philosophy 2024-01-18T12:01:16Z 2024-01-18T12:01:16Z 2023 Thesis-Doctor of Philosophy Zhang, X. (2023). SRAM-based compute-in-memory macros for artificial intelligence applications. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173157 https://hdl.handle.net/10356/173157 10.32657/10356/173157 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering::Integrated circuits |
spellingShingle |
Engineering::Electrical and electronic engineering::Integrated circuits Zhang, Xin SRAM-based compute-in-memory macros for artificial intelligence applications |
description |
With the booming of artificial intelligence technology, processing of intensive data in the traditional von Neumann hardware faces numerous challenges, such as power-hungry computing and unsatisfactory processing latency. However, for edge devices, especially battery-based ones, low power consumption is a critical and high-priority requirement. It is well-known that the “memory wall” between the computation and storage units requires frequent data transmission, which leads to considerable power consumption and longer processing latency. To break down the “memory wall”, compute-in-memory (CIM) is proposed as an attractive and promising method, where the storage and computation functions are both accomplished in the bit-cell array. Therefore, with the implementation of the CIM approach, energy efficiency is improved tremendously with minimum memory access. This thesis mainly explores the SRAM-based compute-in- memory macros for artificial intelligence applications to achieve higher energy efficiency and shorter processing delay.
As the first example, an analog-based transposable CIM macro is proposed to accelerate both inference and training stages in the convolutional neural networks. Compared with the previous transposable works which accomplish the two-way processing with the shared unit, the proposed local transpose bit- cell can achieve two-direction data propagation and improve the array utilization rate to 100%.
Then, aiming to eliminate the intrinsic limitations of analog-based CIM design, such as limited ADC precision and PVT variations, a digital-based CIM macro is introduced to achieve 400 MHz full-precision CIM processing. The no- accuracy loss processing and high energy efficiency are achieved within the proposed 64Kb CIM architecture because of the full-digital circuits.
Finally, a full-digital versatile CIM macro is presented for accelerating various types of ML algorithms. The input and weight precision are reconfigurable from 1-bit to 16-bit. With the weight-stationary addition, operands-stationary addition, and bit-seral multiplication achieved in the CIM macro, self-organizing maps, and convolutional neural networks can be accelerated by the proposed architecture. |
author2 |
Kim Tae Hyoung |
author_facet |
Kim Tae Hyoung Zhang, Xin |
format |
Thesis-Doctor of Philosophy |
author |
Zhang, Xin |
author_sort |
Zhang, Xin |
title |
SRAM-based compute-in-memory macros for artificial intelligence applications |
title_short |
SRAM-based compute-in-memory macros for artificial intelligence applications |
title_full |
SRAM-based compute-in-memory macros for artificial intelligence applications |
title_fullStr |
SRAM-based compute-in-memory macros for artificial intelligence applications |
title_full_unstemmed |
SRAM-based compute-in-memory macros for artificial intelligence applications |
title_sort |
sram-based compute-in-memory macros for artificial intelligence applications |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/173157 |
_version_ |
1789968690498764800 |