VLIW-SCORE: Beyond C for sequential control of SPICE FPGA acceleration

Many stand-alone, FPGA-based accelerators separate the implementation of a computation into two components - (1) a large parallel component that is realized as hardware on spatial FPGA fabric and (2) a small control and co-ordination component that is realized as software on embedded soft-core proce...

Full description

Saved in:
Bibliographic Details
Main Authors: Kapre, Nachiket, DeHon, Andre
Other Authors: School of Computer Engineering
Format: Conference or Workshop Item
Language:English
Published: 2015
Subjects:
Online Access:https://hdl.handle.net/10356/81249
http://hdl.handle.net/10220/39204
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Many stand-alone, FPGA-based accelerators separate the implementation of a computation into two components - (1) a large parallel component that is realized as hardware on spatial FPGA fabric and (2) a small control and co-ordination component that is realized as software on embedded soft-core processors like an off-the-shelf Xilinx Microblaze (or host offchip CPU). While this hardware-software partitioning methodology allows the designer to lower design effort when composing the accelerator system, it introduces unnecessary Amdahl's Law bottlenecks and limits scalability. In this paper, we show how to avoid these limitations with VLIW-SCORE: a combination of a high-level parallel programming framework called SCORE and a custom, hybrid VLIW hardware organization. We demonstrate the benefits of this methodology for the SPICE circuit simulator when implementing the simulation control algorithms. With our spatial mapping flow we are able to improve performance by ≈30% (mean across circuit benchmarks) when compared to the Microblaze implementation for the Xilinx Virtex-6 LX760 FPGA. For complete application acceleration, we see an improved speedup from 1.9× for the Microblaze-based design to 2.6× for the hybrid, custom VLIW implementation when comparing a Xilinx Virtex-6 LX760 FPGA (40nm) with an Intel Core i7 965 CPU (45nm).