A novel design of power-efficient reconfigurable multiplier targeting at increasing the throughput of the CGRA

In the dynamic world of artificial intelligence (AI) with escalating computational needs, the significance of edge computing has risen. Edge computing emphasizes local computations on devices, particularly using Software-Defined Chip (SDC) such as the Coarse-Grained Reconfigurable Architecture (CGRA...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Jiaxu
Other Authors: Goh Wang Ling
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2023
Online Access:https://hdl.handle.net/10356/171880
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In the dynamic world of artificial intelligence (AI) with escalating computational needs, the significance of edge computing has risen. Edge computing emphasizes local computations on devices, particularly using Software-Defined Chip (SDC) such as the Coarse-Grained Reconfigurable Architecture (CGRA), renowned for its superior energy efficiency and adaptable reconfiguration attributes. On an algorithmic front, edge computing's adaptability to computational precision has led to the widespread use of quantized AI models within the CGRA framework, aiming to further boost computational prowess and explore the new way to solve the bottleneck problem of computing power under the limit of Moore's Law. This research aims at increasing the throughput of the CGRA chip, and therefore introduces an innovative strategy to amplify power efficiency of the conventional multiplier, featuring its functionality of conducting parallel multiplications to replace the conventional multiplier inside the Process Element (PE). Comprehensive simulations using 40nm CMOS technology revealed that when using continuous short-bit multiplications workload as a testing benchmark, the throughput of PE saw a notable rise in comparison to the PCAE chip with the original Cadence IP multiplier. And after implementing the array multiplications application on the optimized CGRA, the throughput sees an improvement of 98% and 296% for 4-bit multiplications and 2-bit multiplications respectively. This improvement highlights the substantial enhancement in PE's performance with the integration of the suggested reconfigurable multiplier into the CGRA design, expecting for higher potential to deal with quantized AI models.