Comparing soft and hard vector processing in FPGA-based embedded systems

Soft vector processors can augment and extend the capability of embedded hard vector processors in FPGA-based SoCs such as the Xilinx Zynq. We develop a compiler framework and an auto-tuning runtime that optimizes and executes data-parallel computation either on the scalar ARM processor, the embedde...

Full description

Saved in:
Bibliographic Details
Main Authors: Soh, Jun Jie, Kapre, Nachiket
Other Authors: School of Computer Engineering
Format: Conference or Workshop Item
Language:English
Published: 2015
Subjects:
Online Access:https://hdl.handle.net/10356/81218
http://hdl.handle.net/10220/39132
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Soft vector processors can augment and extend the capability of embedded hard vector processors in FPGA-based SoCs such as the Xilinx Zynq. We develop a compiler framework and an auto-tuning runtime that optimizes and executes data-parallel computation either on the scalar ARM processor, the embedded NEON engine or the Vectorblox MXP soft vector processor as appropriate. We consider computational conditions such as precision, vector length, chunk size, IO requirements under which soft vector processing can outperform scalar cores and hard vector blocks. Across a range of data-parallel benchmarks, we show that the MXP soft vector processor can outperform the NEON engine by up to 3.95× while saving 9% dynamic power (0.1W absolute). Our compilation and runtime framework is also able to outperform the gcc NEON vectorizer under certain conditions by explicit generation of NEON intrinsics and performance tuning of the auto-generated data-parallel code.