Exploiting DSP block capabilities in FPGA high level design flows

The embedded DSP blocks in modern Field Programmable Gate Arrays (FPGAs) are highly capable and support a variety of different data path configurations. These evolved to support a range of applications requiring significant amounts of fast arithmetic. In addition to all the computational capabilitie...

Full description

Saved in:
Bibliographic Details
Main Author: Ronak Bajaj
Other Authors: Suhaib A Fahmy
Format: Theses and Dissertations
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/69815
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The embedded DSP blocks in modern Field Programmable Gate Arrays (FPGAs) are highly capable and support a variety of different data path configurations. These evolved to support a range of applications requiring significant amounts of fast arithmetic. In addition to all the computational capabilities, DSP blocks support runtime dynamic programmability, which allows a single DSP block to be used as a different computational block in every clock cycle. Vendor synthesis tools can infer the use of these resources but they do not exploit their full capabilities, especially the dynamic configuration. Specific language structures arc suggested for implementing standard applications but others that do not fit these standard designs can suffer from inefficient mapping. High-level synthesis (HLS) tools rely on the backend synthesis tools to map efficiently to the target architecture. This thesis explores how DSP blocks can be exploited to produce high throughput computational kernels at close the theoretical limit of the primitives, and how t heir dynamic programmability can be exploited to create efficient implementations. We show that this can be achieved using a high level description, but only by considering architectural information at higher levels. An automated tool flow is presented that takes a high-level description of a computational kernel in C and generates synthesisable Verilog that achieves performance close to theoretical limits of the DSP block with hand-optimised designs. We extend this tool to support proposed techniques for resource sharing of DSP blocks, adapting traditional approaches for the high latency of the DSP blocks, and also applying multi-pumping in this new context. This detailed design results in circuits that always operate at close to the theoretical limits, and offer full utilisation of the DSP block.