MixFX-SCORE: Heterogeneous Fixed-Point Compilation of Dataflow Computations

Mixed-precision implementation of computation can deliver area, throughput and power improvements for dataflow computations over homogeneous fixed-precision circuits without any loss in accuracy. When designing circuits for reconfigurable hardware, we can exercise independent control over bitwidth s...

Full description

Saved in:
Bibliographic Details
Main Authors: Ye, Deheng, Kapre, Nachiket
Other Authors: School of Computer Engineering
Format: Conference or Workshop Item
Language:English
Published: 2015
Subjects:
Online Access:https://hdl.handle.net/10356/81246
http://hdl.handle.net/10220/39194
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Mixed-precision implementation of computation can deliver area, throughput and power improvements for dataflow computations over homogeneous fixed-precision circuits without any loss in accuracy. When designing circuits for reconfigurable hardware, we can exercise independent control over bitwidth selection of each variable in the computation. However, selecting the best precision for each variable is an NP-hard problem. While traditional solutions use automated heuristics like simulated annealing or integer linear programming, they still rely on the manual formulation of resource models, which can be tedious, and potentially inaccurate due to the unpredictable interactions between different stages of the FPGA CAD flow. We develop MixFX-SCORE, an automated tool-flow based on FX-SCORE fixed-point compilation framework and simulated annealing, to address this challenge. We outsource error analysis (Gappa++) and resource model generation (Vivado HLS, Logic Synthesis, Xilinx Place-and-Route) to external tools that offer a more accurate representation of error behavior (backed by proofs) and resource usage (based on actual utilization). We demonstrate 1.1-3.5x LUTs count savings, 1-1.8x DSP count reductions, and 1-3.9x dynamic power improvements while still satisfying the accuracy constraints when compared to homogeneous fixed-point implementations.