SRAM-based compute-in-memory macros for artificial intelligence applications
With the booming of artificial intelligence technology, processing of intensive data in the traditional von Neumann hardware faces numerous challenges, such as power-hungry computing and unsatisfactory processing latency. However, for edge devices, especially battery-based ones, low power consumptio...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/173157 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | With the booming of artificial intelligence technology, processing of intensive data in the traditional von Neumann hardware faces numerous challenges, such as power-hungry computing and unsatisfactory processing latency. However, for edge devices, especially battery-based ones, low power consumption is a critical and high-priority requirement. It is well-known that the “memory wall” between the computation and storage units requires frequent data transmission, which leads to considerable power consumption and longer processing latency. To break down the “memory wall”, compute-in-memory (CIM) is proposed as an attractive and promising method, where the storage and computation functions are both accomplished in the bit-cell array. Therefore, with the implementation of the CIM approach, energy efficiency is improved tremendously with minimum memory access. This thesis mainly explores the SRAM-based compute-in- memory macros for artificial intelligence applications to achieve higher energy efficiency and shorter processing delay.
As the first example, an analog-based transposable CIM macro is proposed to accelerate both inference and training stages in the convolutional neural networks. Compared with the previous transposable works which accomplish the two-way processing with the shared unit, the proposed local transpose bit- cell can achieve two-direction data propagation and improve the array utilization rate to 100%.
Then, aiming to eliminate the intrinsic limitations of analog-based CIM design, such as limited ADC precision and PVT variations, a digital-based CIM macro is introduced to achieve 400 MHz full-precision CIM processing. The no- accuracy loss processing and high energy efficiency are achieved within the proposed 64Kb CIM architecture because of the full-digital circuits.
Finally, a full-digital versatile CIM macro is presented for accelerating various types of ML algorithms. The input and weight precision are reconfigurable from 1-bit to 16-bit. With the weight-stationary addition, operands-stationary addition, and bit-seral multiplication achieved in the CIM macro, self-organizing maps, and convolutional neural networks can be accelerated by the proposed architecture. |
---|