iMAT: energy-efficient in-memory acceleration for ternary neural networks with sparse dot product

Ternary Neural Networks (TNNs) achieve an excellent trade-off between model size, speed, and accuracy, quantizing weights and activations into ternary values {+1, 0, -1}. The ternary multiplication operations in TNNs equal light-weight bitwise operations, favorably in In-Memory Computing (IMC) platf...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلفون الرئيسيون: Zhu, Shien, Huai, Shuo, Xiong, Guochu, Liu, Weichen
مؤلفون آخرون: School of Computer Science and Engineering
التنسيق: Conference or Workshop Item
اللغة:English
منشور في: 2023
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/170218
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Nanyang Technological University
اللغة: English
الوصف
الملخص:Ternary Neural Networks (TNNs) achieve an excellent trade-off between model size, speed, and accuracy, quantizing weights and activations into ternary values {+1, 0, -1}. The ternary multiplication operations in TNNs equal light-weight bitwise operations, favorably in In-Memory Computing (IMC) platforms. Therefore, many IMC-based TNN accelerators have been proposed. They build dedicated ternary multiplication cells or utilize efficient bitwise operations on IMC architectures. However, existing ternary value accumulation schemes on IMC architectures are inefficient. They extend the sign bit of integer operands or conduct two-round accumulation with specially designed encoding, bringing long latency and extra memory write overhead. Moreover, existing IMC-based TNN accelerators overlook TNNs' sparsity and conduct operations on zero weights, resulting in unnecessary power consumption and latency. In this paper, we propose iMAT to accelerate TNNs with operator-, architecture- and layer-level optimizations. First, we propose a single-round Ternary Variable-Bitwidth Accumulation scheme, which efficiently extends the addition result sign bit without extra memory write overhead. Second, we propose an in-memory accelerator with enhanced sensing circuits for the accumulation scheme and a Sparse Dot Product Unit to exploit TNNs' weight sparsity, utilizing zero weights to skip unnecessary operations. Further, we propose Fused Scaling Functions which combine the scaling, activation, normalization, and quantization layers to reduce the hardware complexity without affecting the model accuracy. Simulation results show that compared with dense in-memory TNN accelerators, our iMAT achieves up to 2.7X speedup and 3.7X energy efficiency on ternary ResNet-18.