Energy efficient SoC-based CGRA hardware computation accelerator
Due to the needs of modern societal development, the demand for chips is greatly increasing. With the continuous optimization of manufacturing processes and the continuous improvement of design workflows, the requirements for chips are also continuously increasing. To follow this trend, this paper p...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175959 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Due to the needs of modern societal development, the demand for chips is greatly increasing. With the continuous optimization of manufacturing processes and the continuous improvement of design workflows, the requirements for chips are also continuously increasing. To follow this trend, this paper proposes a new architecture Coarse-Grained Reconfigurable Array (CGRA). It enables chips to have higher performance, lower power consumption, smaller size, and better stability in various environments, thereby gaining sufficient market competitiveness.
CGRA effectively connects the gap between the high-efficiency accelerators and the flexible processors. They consist of an array of word-level processing elements interconnected on-chip, with both elements and interconnects capable of being reconfigured every cycle according to the configuration memory's content. This necessitates that compilers map the compute-intensive loop kernels of applications onto the CGRA in both spatial and temporal dimensions through the configuration memory setup. The inherent simplicity and parallelism of the architecture, together with a potent compiler, allow CGRA to achieve a balance of hardware-like efficiency with software-like programmability.
This dissertation introduces a new acceleration structure that incorporates external SRAM into the existing CGRA computational flow. It combines high performance, energy efficiency, and versatility to support a wide range of application domains. Firstly, the use of this structure significantly reduces the communication costs between the logic units and the SRAM, which is particularly crucial for applications with rapidly changing requirements. Secondly, it achieves a collaborative design of software and hardware through compiler mapping algorithms, not only simplifying the development process but also reducing the demands on technology. Moreover, this structure can acquire new acceleration features through software upgrades, greatly extending the product lifecycle. |
---|