Design automation flow for partial run-time reconfiguration on FPGAs

Field-Programmable Gate Array (FPGA) is a programmable hardware that allows post-manufacturing configuration to meet application-specific functionality and requirement. Partial Reconfiguration (PR) is an advanced feature in modern FPGAs that enables the configuration of the FPGA to be altered at run...

Full description

Saved in:
Bibliographic Details
Main Author: Mao, Fubing
Other Authors: Lam Siew Kei
Format: Theses and Dissertations
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/73042
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Field-Programmable Gate Array (FPGA) is a programmable hardware that allows post-manufacturing configuration to meet application-specific functionality and requirement. Partial Reconfiguration (PR) is an advanced feature in modern FPGAs that enables the configuration of the FPGA to be altered at runtime. This provides the means to maximize the utilization of the limited FPGA resources to support more functions and shorten the time to market of the product. However, there is a lack of efficient computer-aided design (CAD) tools for placement and routing that support partial reconfiguration on FPGAs. Traditional approaches usually rely on manual partitioning and placement, which is an error-prone and tedious process, and requires huge efforts and long development cycle due to the large design space. While traditional FPGA design flow usually employs fine-grained tile-based placement, modular placement is increasingly required to speed up the large-scale placement and reduce the synthesis time. Moreover, the commonly used modules can be pre-synthesized and stored in the library for design reuse to significantly lower the design time, verification time and development cost. To address the problems mentioned above, this research attempts to fill the gap by proposing an automatic mapping flow for efficient PR. The thesis has three major contributions. Firstly, we propose a library-based placement and routing flow, which best utilizes the pre-placed and routed modules from the library to significantly save the execution time while considering area-delay products of each module with different ratios and optimizing area and delay of the final design. The flow supports both the static and reconfigurable modules. The modular information is represented in a B*-Tree structure, and the B*-Tree operations are amended and used with Simulated Annealing (SA) to enable rapid exploration of the placement space. Different width-height ratios of the modules are exploited to achieve area and delay optimization. Partial reconfiguration-aware routing using pin-to-wire abutment is proposed to connect the modules after placement. Our placer can reduce the compilation time on average with acceptable area and delay overhead compared to tile-based results from the Versatile Place and Route (VPR) tool through the reuse of module information in the library for the target architecture. Secondly, we propose a dynamic module partitioning approach for the library based design flow to dynamically generate the appropriate shape of modules based on single-ratio modules in the library while efficiently utilizing the pre-placement module information. A set of rules are developed to select the most suitable module and determine the partition to minimize the area and delay of the placement without increasing much of the synthesis time. The proposed approach can adapt to different architectures and also address the fixed-outline constraint. Experiment results show that our approach can reduce the area by up to 10% with marginal increase in delay and acceptable runtime. Finally, we explore the automatic workflow mapping in the interposer based multi-FPGA system. We propose a two-stage modular placement flow for interposer based multiple FPGAs aiming for delay optimization with the incorporation of a detailed interposer routing model for wirelength and delay estimation. We adopt the force-directed method for its global property to obtain an efficient solution as a starting point of the placement. Next, we employ simulated annealing (SA) for its efficiency and effectiveness in refining the initial solution. In order to speed up the refinement process, the hierarchical B*-tree (HB*-tree) is employed to enable a fast search and convergence. The experiment results demonstrate that our flow can achieve an efficient solution in reasonable time and can scale well for different design sizes.