SRAM-based compute-in-memory macros for artificial intelligence applications

With the booming of artificial intelligence technology, processing of intensive data in the traditional von Neumann hardware faces numerous challenges, such as power-hungry computing and unsatisfactory processing latency. However, for edge devices, especially battery-based ones, low power consumptio...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Xin
Other Authors: Kim Tae Hyoung
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/173157
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-173157
record_format dspace
spelling sg-ntu-dr.10356-1731572024-02-01T09:53:44Z SRAM-based compute-in-memory macros for artificial intelligence applications Zhang, Xin Kim Tae Hyoung School of Electrical and Electronic Engineering THKIM@ntu.edu.sg Engineering::Electrical and electronic engineering::Integrated circuits With the booming of artificial intelligence technology, processing of intensive data in the traditional von Neumann hardware faces numerous challenges, such as power-hungry computing and unsatisfactory processing latency. However, for edge devices, especially battery-based ones, low power consumption is a critical and high-priority requirement. It is well-known that the “memory wall” between the computation and storage units requires frequent data transmission, which leads to considerable power consumption and longer processing latency. To break down the “memory wall”, compute-in-memory (CIM) is proposed as an attractive and promising method, where the storage and computation functions are both accomplished in the bit-cell array. Therefore, with the implementation of the CIM approach, energy efficiency is improved tremendously with minimum memory access. This thesis mainly explores the SRAM-based compute-in- memory macros for artificial intelligence applications to achieve higher energy efficiency and shorter processing delay. As the first example, an analog-based transposable CIM macro is proposed to accelerate both inference and training stages in the convolutional neural networks. Compared with the previous transposable works which accomplish the two-way processing with the shared unit, the proposed local transpose bit- cell can achieve two-direction data propagation and improve the array utilization rate to 100%. Then, aiming to eliminate the intrinsic limitations of analog-based CIM design, such as limited ADC precision and PVT variations, a digital-based CIM macro is introduced to achieve 400 MHz full-precision CIM processing. The no- accuracy loss processing and high energy efficiency are achieved within the proposed 64Kb CIM architecture because of the full-digital circuits. Finally, a full-digital versatile CIM macro is presented for accelerating various types of ML algorithms. The input and weight precision are reconfigurable from 1-bit to 16-bit. With the weight-stationary addition, operands-stationary addition, and bit-seral multiplication achieved in the CIM macro, self-organizing maps, and convolutional neural networks can be accelerated by the proposed architecture. Doctor of Philosophy 2024-01-18T12:01:16Z 2024-01-18T12:01:16Z 2023 Thesis-Doctor of Philosophy Zhang, X. (2023). SRAM-based compute-in-memory macros for artificial intelligence applications. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173157 https://hdl.handle.net/10356/173157 10.32657/10356/173157 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering::Integrated circuits
spellingShingle Engineering::Electrical and electronic engineering::Integrated circuits
Zhang, Xin
SRAM-based compute-in-memory macros for artificial intelligence applications
description With the booming of artificial intelligence technology, processing of intensive data in the traditional von Neumann hardware faces numerous challenges, such as power-hungry computing and unsatisfactory processing latency. However, for edge devices, especially battery-based ones, low power consumption is a critical and high-priority requirement. It is well-known that the “memory wall” between the computation and storage units requires frequent data transmission, which leads to considerable power consumption and longer processing latency. To break down the “memory wall”, compute-in-memory (CIM) is proposed as an attractive and promising method, where the storage and computation functions are both accomplished in the bit-cell array. Therefore, with the implementation of the CIM approach, energy efficiency is improved tremendously with minimum memory access. This thesis mainly explores the SRAM-based compute-in- memory macros for artificial intelligence applications to achieve higher energy efficiency and shorter processing delay. As the first example, an analog-based transposable CIM macro is proposed to accelerate both inference and training stages in the convolutional neural networks. Compared with the previous transposable works which accomplish the two-way processing with the shared unit, the proposed local transpose bit- cell can achieve two-direction data propagation and improve the array utilization rate to 100%. Then, aiming to eliminate the intrinsic limitations of analog-based CIM design, such as limited ADC precision and PVT variations, a digital-based CIM macro is introduced to achieve 400 MHz full-precision CIM processing. The no- accuracy loss processing and high energy efficiency are achieved within the proposed 64Kb CIM architecture because of the full-digital circuits. Finally, a full-digital versatile CIM macro is presented for accelerating various types of ML algorithms. The input and weight precision are reconfigurable from 1-bit to 16-bit. With the weight-stationary addition, operands-stationary addition, and bit-seral multiplication achieved in the CIM macro, self-organizing maps, and convolutional neural networks can be accelerated by the proposed architecture.
author2 Kim Tae Hyoung
author_facet Kim Tae Hyoung
Zhang, Xin
format Thesis-Doctor of Philosophy
author Zhang, Xin
author_sort Zhang, Xin
title SRAM-based compute-in-memory macros for artificial intelligence applications
title_short SRAM-based compute-in-memory macros for artificial intelligence applications
title_full SRAM-based compute-in-memory macros for artificial intelligence applications
title_fullStr SRAM-based compute-in-memory macros for artificial intelligence applications
title_full_unstemmed SRAM-based compute-in-memory macros for artificial intelligence applications
title_sort sram-based compute-in-memory macros for artificial intelligence applications
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/173157
_version_ 1789968690498764800