Design and optimization of DSP architectures for multi-context FPGA with dynamic reconfiguration

Field Programmable Gate Arrays (FPGAs) are now widely adopted as hardware accelerators due to their inherent parallel processing capability. However, the sub-optimal logic utilization and large reconfiguration latency in conventional single-context FPGAs pose constraints on their usage for applicat...

Full description

Saved in:
Bibliographic Details
Main Author: Rakesh Vijayakumara Warrier
Other Authors: Vun Chan Hua, Nicholas
Format: Theses and Dissertations
Language:English
Published: 2016
Subjects:
Online Access:https://hdl.handle.net/10356/69398
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-69398
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Rakesh Vijayakumara Warrier
Design and optimization of DSP architectures for multi-context FPGA with dynamic reconfiguration
description Field Programmable Gate Arrays (FPGAs) are now widely adopted as hardware accelerators due to their inherent parallel processing capability. However, the sub-optimal logic utilization and large reconfiguration latency in conventional single-context FPGAs pose constraints on their usage for applications like adaptive control systems in vehicles, software defined radio where frequent context switching or resource sharing between tasks are required. As such, multi-context FPGAs with dynamic reconfiguration capability have been introduced with the aim to allow rapid reconfiguration of the FPGA, and hence increase the effective logic density. The current generation of multi-context FPGAs typically use a dynamic reconfigurable architecture based on static Random Access Memory (RAM) to implement multiple configuration planes that enable fast switching between contexts. The main challenge of these types of multi-context FPGAs are limited on-chip storage and relatively long reconfiguration latency (of the order of milliseconds). With technology driving down to nano scale, new generations of hybrid multi-context FPGA architectures, such as the CMOS/NAnoTechnology reconfigURablE architecture (NATURE) that use on-chip nano RAMs to store multiple configurations to enable extremely fast runtime reconfiguration (of the order of pico seconds) have been developed. This type of FPGA enables cycle-by-cycle reconfiguration and temporal logic folding resulting in improved logic density and area-delay product by more than an order of magnitude compared to traditional FPGAs. However, the fine granularity of this type of architecture limits its usage as a high performance hardware accelerator that implements compute intensive arithmetic operations. This research work explores and presents how DSP architectures can be designed for the hybrid multi-context NATURE platform in order to fully exploit its advantages and possibilities. The performance of various compute intensive signal processing kernels are used in the study to benchmark the improvements achieved by the proposed DSP architectures. A full-block dynamically reconfigurable DSP architecture is first presented, which can be reconfigured at runtime to implement different arithmetic functions in different clock cycles. To fully exploit the capability of temporal logic folding techniques in NATURE, the DSP block is then extended to support pipeline level reconfiguration that allows independent reconfiguration of individual pipeline stages. To enable efficient implementation of mixed-precision applications, the capability to dynamically fracture the internal compute-path of the DSP block is also incorporated into the design. The design automation tool for the NATURE platform is extended to enable efficient mapping of compute intensive kernels utilizing the proposed DSP architecture(s) by exploring optimum resource sharing and area/power reduction. A design space exploration algorithm is developed and incorporated into the mapping tool that can determine the optimal configuration for a given input circuit, based on the design requirements and user constraints. The proposed technique automatically explores the different folding levels and DSP modes (configurations), evaluates their area/power trade-off and determines the most efficient mapping of the chosen configuration, which is subsequently fed to the mapping flow to generate the bitstream. The contributions of this work would allow system designers to design and map compute intensive arithmetic kernels on the next generation hybrid multi-context FPGA platforms with ease, while providing high computational performance and energy efficiency that are required for many modern applications.
author2 Vun Chan Hua, Nicholas
author_facet Vun Chan Hua, Nicholas
Rakesh Vijayakumara Warrier
format Theses and Dissertations
author Rakesh Vijayakumara Warrier
author_sort Rakesh Vijayakumara Warrier
title Design and optimization of DSP architectures for multi-context FPGA with dynamic reconfiguration
title_short Design and optimization of DSP architectures for multi-context FPGA with dynamic reconfiguration
title_full Design and optimization of DSP architectures for multi-context FPGA with dynamic reconfiguration
title_fullStr Design and optimization of DSP architectures for multi-context FPGA with dynamic reconfiguration
title_full_unstemmed Design and optimization of DSP architectures for multi-context FPGA with dynamic reconfiguration
title_sort design and optimization of dsp architectures for multi-context fpga with dynamic reconfiguration
publishDate 2016
url https://hdl.handle.net/10356/69398
_version_ 1759854088294498304
spelling sg-ntu-dr.10356-693982023-03-04T00:38:47Z Design and optimization of DSP architectures for multi-context FPGA with dynamic reconfiguration Rakesh Vijayakumara Warrier Vun Chan Hua, Nicholas School of Computer Science and Engineering Centre for High Performance Embedded Systems DRNTU::Engineering::Computer science and engineering Field Programmable Gate Arrays (FPGAs) are now widely adopted as hardware accelerators due to their inherent parallel processing capability. However, the sub-optimal logic utilization and large reconfiguration latency in conventional single-context FPGAs pose constraints on their usage for applications like adaptive control systems in vehicles, software defined radio where frequent context switching or resource sharing between tasks are required. As such, multi-context FPGAs with dynamic reconfiguration capability have been introduced with the aim to allow rapid reconfiguration of the FPGA, and hence increase the effective logic density. The current generation of multi-context FPGAs typically use a dynamic reconfigurable architecture based on static Random Access Memory (RAM) to implement multiple configuration planes that enable fast switching between contexts. The main challenge of these types of multi-context FPGAs are limited on-chip storage and relatively long reconfiguration latency (of the order of milliseconds). With technology driving down to nano scale, new generations of hybrid multi-context FPGA architectures, such as the CMOS/NAnoTechnology reconfigURablE architecture (NATURE) that use on-chip nano RAMs to store multiple configurations to enable extremely fast runtime reconfiguration (of the order of pico seconds) have been developed. This type of FPGA enables cycle-by-cycle reconfiguration and temporal logic folding resulting in improved logic density and area-delay product by more than an order of magnitude compared to traditional FPGAs. However, the fine granularity of this type of architecture limits its usage as a high performance hardware accelerator that implements compute intensive arithmetic operations. This research work explores and presents how DSP architectures can be designed for the hybrid multi-context NATURE platform in order to fully exploit its advantages and possibilities. The performance of various compute intensive signal processing kernels are used in the study to benchmark the improvements achieved by the proposed DSP architectures. A full-block dynamically reconfigurable DSP architecture is first presented, which can be reconfigured at runtime to implement different arithmetic functions in different clock cycles. To fully exploit the capability of temporal logic folding techniques in NATURE, the DSP block is then extended to support pipeline level reconfiguration that allows independent reconfiguration of individual pipeline stages. To enable efficient implementation of mixed-precision applications, the capability to dynamically fracture the internal compute-path of the DSP block is also incorporated into the design. The design automation tool for the NATURE platform is extended to enable efficient mapping of compute intensive kernels utilizing the proposed DSP architecture(s) by exploring optimum resource sharing and area/power reduction. A design space exploration algorithm is developed and incorporated into the mapping tool that can determine the optimal configuration for a given input circuit, based on the design requirements and user constraints. The proposed technique automatically explores the different folding levels and DSP modes (configurations), evaluates their area/power trade-off and determines the most efficient mapping of the chosen configuration, which is subsequently fed to the mapping flow to generate the bitstream. The contributions of this work would allow system designers to design and map compute intensive arithmetic kernels on the next generation hybrid multi-context FPGA platforms with ease, while providing high computational performance and energy efficiency that are required for many modern applications. COMPUTER ENGINEERING 2016-12-23T06:37:27Z 2016-12-23T06:37:27Z 2016 Thesis Rakesh Vijayakumara Warrier. (2016). Design and optimization of DSP architectures for multi-context FPGA with dynamic reconfiguration. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/69398 10.32657/10356/69398 en 207 p. application/pdf