Rapid design exploration framework for realizing custom computing systems on FPGAs

Field Programmable Gate Arrays (FPGAs) have now become one of the most preferred computing platforms for implementing configurable system-on-chip despite the challenges in meeting the cost, performance and energy requirements of embedded systems. The main driver for the proliferation of FPGAs lies i...

Full description

Saved in:
Bibliographic Details
Main Author: Aung, Yan Lin
Other Authors: Thambipillai Srikanthan
Format: Theses and Dissertations
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/66338
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-66338
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering
spellingShingle DRNTU::Engineering
Aung, Yan Lin
Rapid design exploration framework for realizing custom computing systems on FPGAs
description Field Programmable Gate Arrays (FPGAs) have now become one of the most preferred computing platforms for implementing configurable system-on-chip despite the challenges in meeting the cost, performance and energy requirements of embedded systems. The main driver for the proliferation of FPGAs lies in the demands for shorter time-to-market and lower non-recurring engineering pressures. However, lack of new design methodologies and techniques that can effectively leverage on the various hardware (reconfigurable logic and digital signal processing blocks) and software (soft and hard processor cores) computational resources in modern FPGAs continue to remain the bottleneck. The main aim of this research is to develop a constraint-aware design exploration framework for modern FPGA systems for the rapid realization of custom computing solutions. An efficient technique for the high-level software performance estimation has been proposed without necessitating application execution on the target processor or instruction set simulators. The proposed technique incorporates dynamic characteristics of the target processor such as branch penalty and data dependency in order to achieve high estimation accuracy. In addition, a novel control-flow mapping strategy has been introduced to realize the rapid estimation of compiler-optimized software. Experimental results based on widely used CHStone benchmark suite show that the proposed technique can be reliably used to estimate the software performance on PowerPC processor with an average estimation error of only 6%. In order to rapidly estimate the look-up tables (LUTs) utilization of custom hardware data-paths, a technology-mapping aware clustering technique has been proposed. Unlike the existing work, the proposed technique takes into account the synthesis optimizations and technology mapping, which are relied upon by commercial FPGA synthesis tools. Experimental results show that the proposed area estimation technique is able to estimate LUTs utilization of the data-paths with an average estimation error of 9%, which outperforms an existing technique by 29%, for Altera Cyclone II and 7% for Stratix IV FPGAs. A regression-based technique to estimate the LUTs utilization of finite state machine (FSM) based controllers has also been proposed. Multiple linear curve fitting was applied to obtain the parameters for the proposed regression model. Experimental results show the regression-based technique is able to estimate LUTs utilization of the FSMs with an average estimation error of 9% and achieves 24% improvement over an existing analytical technique. In addition, a strategy to estimate the utilization of on-chip digital signal processing (DSP) blocks for different types of multiply operators, based on synthesis inference models, has been developed. This provides for the efficient incorporation of DSP blocks during the estimation process. It has been successfully demonstrated that the proposed technique is capable of estimating DSPs and LUTs utilization for various multiply operators with an accuracy of 100% in almost all the cases. In order to estimate the critical path delay and cycle counts of custom hardware accelerators, a high-level estimation technique that relies on the technology-mapping aware clustering algorithm has been proposed. The proposed technique takes into consideration the synthesis optimizations employed by the commercial FPGA design tool in order to increase the estimation accuracy. Evaluations based on the hardware accelerators from a widely used CHStone benchmark suite show that the proposed technique is able to estimate the critical path delays with an average estimation error of 8% and 14% for Altera Cyclone II and Stratix IV FPGAs. It is noteworthy that the run-time of the proposed area-time estimation technique is in the order of milliseconds, thereby yielding three orders of magnitude speed up when compared with the commercial FPGA synthesis process and yet provides for reasonably accurate area-time estimation. Communication-aware hardware-software partitioning algorithm has been devised for identifying the profitable candidate blocks for hardware acceleration. A hybrid technique based on 0-1 Knapsack and modified Simulated Annealing has been proposed. The KnapSim algorithm can achieve near optimal solution at significantly lower run-time compared to an existing state-of-the-art genetic algorithm based approach. The proposed partitioning algorithm is used to realize a design exploration framework for constraint-aware (i.e. FPGA LUTs and DSP blocks) performance optimization. A case study using a widely used application demonstrates that the proposed framework is capable of rapid design exploration without invoking execution of compiled code and FPGA implementation. Finally, the proposed framework can be readily integrated with commercial FPGA toolchains in order to cope with the design exploration challenges associated with complex embedded computing applications.
author2 Thambipillai Srikanthan
author_facet Thambipillai Srikanthan
Aung, Yan Lin
format Theses and Dissertations
author Aung, Yan Lin
author_sort Aung, Yan Lin
title Rapid design exploration framework for realizing custom computing systems on FPGAs
title_short Rapid design exploration framework for realizing custom computing systems on FPGAs
title_full Rapid design exploration framework for realizing custom computing systems on FPGAs
title_fullStr Rapid design exploration framework for realizing custom computing systems on FPGAs
title_full_unstemmed Rapid design exploration framework for realizing custom computing systems on FPGAs
title_sort rapid design exploration framework for realizing custom computing systems on fpgas
publishDate 2016
url http://hdl.handle.net/10356/66338
_version_ 1759857953575272448
spelling sg-ntu-dr.10356-663382023-03-04T00:29:20Z Rapid design exploration framework for realizing custom computing systems on FPGAs Aung, Yan Lin Thambipillai Srikanthan School of Computer Engineering Centre for High Performance Embedded Systems DRNTU::Engineering Field Programmable Gate Arrays (FPGAs) have now become one of the most preferred computing platforms for implementing configurable system-on-chip despite the challenges in meeting the cost, performance and energy requirements of embedded systems. The main driver for the proliferation of FPGAs lies in the demands for shorter time-to-market and lower non-recurring engineering pressures. However, lack of new design methodologies and techniques that can effectively leverage on the various hardware (reconfigurable logic and digital signal processing blocks) and software (soft and hard processor cores) computational resources in modern FPGAs continue to remain the bottleneck. The main aim of this research is to develop a constraint-aware design exploration framework for modern FPGA systems for the rapid realization of custom computing solutions. An efficient technique for the high-level software performance estimation has been proposed without necessitating application execution on the target processor or instruction set simulators. The proposed technique incorporates dynamic characteristics of the target processor such as branch penalty and data dependency in order to achieve high estimation accuracy. In addition, a novel control-flow mapping strategy has been introduced to realize the rapid estimation of compiler-optimized software. Experimental results based on widely used CHStone benchmark suite show that the proposed technique can be reliably used to estimate the software performance on PowerPC processor with an average estimation error of only 6%. In order to rapidly estimate the look-up tables (LUTs) utilization of custom hardware data-paths, a technology-mapping aware clustering technique has been proposed. Unlike the existing work, the proposed technique takes into account the synthesis optimizations and technology mapping, which are relied upon by commercial FPGA synthesis tools. Experimental results show that the proposed area estimation technique is able to estimate LUTs utilization of the data-paths with an average estimation error of 9%, which outperforms an existing technique by 29%, for Altera Cyclone II and 7% for Stratix IV FPGAs. A regression-based technique to estimate the LUTs utilization of finite state machine (FSM) based controllers has also been proposed. Multiple linear curve fitting was applied to obtain the parameters for the proposed regression model. Experimental results show the regression-based technique is able to estimate LUTs utilization of the FSMs with an average estimation error of 9% and achieves 24% improvement over an existing analytical technique. In addition, a strategy to estimate the utilization of on-chip digital signal processing (DSP) blocks for different types of multiply operators, based on synthesis inference models, has been developed. This provides for the efficient incorporation of DSP blocks during the estimation process. It has been successfully demonstrated that the proposed technique is capable of estimating DSPs and LUTs utilization for various multiply operators with an accuracy of 100% in almost all the cases. In order to estimate the critical path delay and cycle counts of custom hardware accelerators, a high-level estimation technique that relies on the technology-mapping aware clustering algorithm has been proposed. The proposed technique takes into consideration the synthesis optimizations employed by the commercial FPGA design tool in order to increase the estimation accuracy. Evaluations based on the hardware accelerators from a widely used CHStone benchmark suite show that the proposed technique is able to estimate the critical path delays with an average estimation error of 8% and 14% for Altera Cyclone II and Stratix IV FPGAs. It is noteworthy that the run-time of the proposed area-time estimation technique is in the order of milliseconds, thereby yielding three orders of magnitude speed up when compared with the commercial FPGA synthesis process and yet provides for reasonably accurate area-time estimation. Communication-aware hardware-software partitioning algorithm has been devised for identifying the profitable candidate blocks for hardware acceleration. A hybrid technique based on 0-1 Knapsack and modified Simulated Annealing has been proposed. The KnapSim algorithm can achieve near optimal solution at significantly lower run-time compared to an existing state-of-the-art genetic algorithm based approach. The proposed partitioning algorithm is used to realize a design exploration framework for constraint-aware (i.e. FPGA LUTs and DSP blocks) performance optimization. A case study using a widely used application demonstrates that the proposed framework is capable of rapid design exploration without invoking execution of compiled code and FPGA implementation. Finally, the proposed framework can be readily integrated with commercial FPGA toolchains in order to cope with the design exploration challenges associated with complex embedded computing applications. Doctor of Philosophy (SCE) 2016-03-29T08:37:26Z 2016-03-29T08:37:26Z 2016 Thesis http://hdl.handle.net/10356/66338 en 205 p. application/pdf