A 65-nm 8T SRAM compute-in-memory macro with column ADCs for processing neural networks
In this work, we present a novel 8T static random access memory (SRAM)-based compute-in-memory (CIM) macro for processing neural networks with high energy efficiency. The proposed 8T bitcell is free from disturb issues thanks to the decoupled read channels by adding two extra transistors to the stan...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/163744 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In this work, we present a novel 8T static random access memory (SRAM)-based compute-in-memory (CIM) macro for processing neural networks with high energy efficiency. The proposed 8T bitcell is free from disturb issues thanks to the decoupled read channels by adding two extra transistors to the standard 6T bitcell. A 128 $\times $ 128 8T SRAM array offers massively parallel binary multiply and accumulate (MAC) operations with 64 $\times $ binary inputs (0/1) and 64 $\times $ 128 binary weights (+1/-1). After parallel MAC operations, 128 column-based neurons generate 128 $\times $ 1-5 bit outputs in parallel. The proposed column-based neuron comprises 64 $\times $ bitcells for dot-product, 32 $\times $ bitcells for analog-to-digital converter (ADC), and 32 $\times $ bitcells for offset calibration. The column ADC with 32 $\times $ replica SRAM bitcells converts the analog MAC results (i.e., a differential read bitline (RBL/RBLb) voltage) to the 1-5 bit output code by sweeping their reference levels in 1-31 cycles (i.e., $2^{N}$ -1 cycles for $N$ -bit ADC). The measured linearity results [differential nonlinearity (DNL) and integral nonlinearity (INL)] are +0.314/-0.256 least significant bit (LSB) and + 0.27/-0.116 LSB, respectively, after offset calibration. The simulated image classification results are 96.37% for Mixed National Institute of Standards and Technology database (MNIST) using a multi-layer perceptron (MLP) with two hidden layers, 87.1%/82.66% for CIFAR-10 using VGG-like/ResNet-18 convolutional neural networks (CNNs), demonstrating slight accuracy degradations (0.67%-1.34%) compared with the software baseline. A test chip with a 16K 8T SRAM bitcell array is fabricated using a 65-nm process. The measured energy efficiency is 490-15.8 TOPS/W for 1-5 bit ADC resolution using 0.45-/0.8-V core supply. |
---|