Intelligent high level synthesis for customization on reconfigurable platforms

High level synthesis (HLS) using C/C++ has increasingly become a critical step in the realization of complex digital systems. One of the major research focus areas in this space has been to realize efficient synthesis of complex systems without violating stringent time-to-market constraints. Most of...

Full description

Saved in:

Bibliographic Details
Main Author:	Sharad Sinha
Other Authors:	Thambipillai Srikanthan
Format:	Theses and Dissertations
Language:	English
Published:	2014
Subjects:	DRNTU::Engineering::Computer science and engineering::Hardware::Arithmetic and logic structures DRNTU::Engineering::Computer science and engineering::Hardware::Register-transfer-level implementation
Online Access:	https://hdl.handle.net/10356/61691
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Description
Summary:	High level synthesis (HLS) using C/C++ has increasingly become a critical step in the realization of complex digital systems. One of the major research focus areas in this space has been to realize efficient synthesis of complex systems without violating stringent time-to-market constraints. Most of the related work cited in the literature has been mainly confined to supporting language constructs in C/C++, scheduling of operations and hardware binding for area reduction phase. The main motivation of the research work presented in this thesis is to develop novel algorithms for intelligent high level synthesis without designer intervention. Using timing constraint as a knob, an algorithm called Extended Compatibility Path Based Binding (ECPB) has been proposed for resource sharing during hardware binding to minimize area utilization. It was demonstrated that the proposed method can be automated and has been shown to yield 12.49% and 29.21% on average lower area-delay product compared to compatibility path based (CPB) and weighted bipartite matching (WBM) based binding respectively. Building upon ECPB, a latency-preserving algorithm for area-delay optimization has been proposed to reduce area without violating timing constraint. Data-initiation-interval aware graph partitioning algorithm has been proposed to partition an application’s dataflow graph. This has also paved the way for further area reduction by invoking resource sharing without violating the initiation interval constraints. In addition, a systematic technique for area-delay trade-off analysis has been developed to establish multiple design points. The proposed method for initiation-interval-aware area optimization can be deployed for analyzing large dataflow graphs efficiently, making it highly scalable. Technique for the efficient utilization of DSP resources available in FPGA platforms has been proposed to maximize application performance. It relies on the systematic investigation for the existence of multiplication and allied operations within the frequently executed code blocks of an application. Model based inferences for different types of multiplication were developed to facilitate the rapid identification of profitable regions for maximizing performance. Investigations confirm that operating clock frequency can be increased by up to 3 times when compared to a commercial (Vivado-HLS) tool. In order to combine the strengths of IP-core based design and high level synthesis, concepts of program recognition and automatic algorithm replacement are relied upon to develop lexical and pattern based analysis. This has led to the automatic identification of in-built arithmetic functions in a C/C++ application and compiler specific patterns. A novel IP-core selection algorithm has been proposed to facilitate the binding to available IP cores. Our investigations confirm that it lends well for notable area reduction when compared with that possible using a commercial HLS tool (Vivado-HLS). In addition, multiple design points can be generated to facilitate area-delay tradeoff analysis by associating a combination of IP-cores at a time. Our investigations show that the Look up Table (LUT) reduction can be from 60% to 75% while the clock period reduction can range from 16% to 40% for the benchmarks investigated. While the proposed methods for an intelligent high-level synthesis flow are applicable across all application domains, digital signal and information processing applications benefit greatly due to the existence of operations such as multiplication and transcendental functions. Additionally, since these methods look for application characteristics and exploit architecture specifics, they lead to customized synthesis solutions. Finally, the proposed methods have contributed to the realization of an intelligent high-level synthesis framework, which paves the way for less reliance on hand-crafted designs and skilled hardware designers.

Intelligent high level synthesis for customization on reconfigurable platforms

Similar Items