Comparing soft and hard vector processing in FPGA-based embedded systems

Soft vector processors can augment and extend the capability of embedded hard vector processors in FPGA-based SoCs such as the Xilinx Zynq. We develop a compiler framework and an auto-tuning runtime that optimizes and executes data-parallel computation either on the scalar ARM processor, the embedde...

Full description

Saved in:

Bibliographic Details
Main Authors:	Soh, Jun Jie, Kapre, Nachiket
Other Authors:	School of Computer Engineering
Format:	Conference or Workshop Item
Language:	English
Published:	2015
Subjects:	Computer Science and Engineering
Online Access:	https://hdl.handle.net/10356/81218 http://hdl.handle.net/10220/39132
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-81218
record_format	dspace
spelling	sg-ntu-dr.10356-812182020-05-28T07:17:23Z Comparing soft and hard vector processing in FPGA-based embedded systems Soh, Jun Jie Kapre, Nachiket School of Computer Engineering 2014 24th International Conference on Field Programmable Logic and Applications (FPL) Computer Science and Engineering Soft vector processors can augment and extend the capability of embedded hard vector processors in FPGA-based SoCs such as the Xilinx Zynq. We develop a compiler framework and an auto-tuning runtime that optimizes and executes data-parallel computation either on the scalar ARM processor, the embedded NEON engine or the Vectorblox MXP soft vector processor as appropriate. We consider computational conditions such as precision, vector length, chunk size, IO requirements under which soft vector processing can outperform scalar cores and hard vector blocks. Across a range of data-parallel benchmarks, we show that the MXP soft vector processor can outperform the NEON engine by up to 3.95× while saving 9% dynamic power (0.1W absolute). Our compilation and runtime framework is also able to outperform the gcc NEON vectorizer under certain conditions by explicit generation of NEON intrinsics and performance tuning of the auto-generated data-parallel code. Accepted version 2015-12-17T06:21:38Z 2019-12-06T14:25:48Z 2015-12-17T06:21:38Z 2019-12-06T14:25:48Z 2014 Conference Paper Soh, J. J., & Kapre, N. (2014). Comparing soft and hard vector processing in FPGA-based embedded systems. 2014 24th International Conference on Field Programmable Logic and Applications (FPL). https://hdl.handle.net/10356/81218 http://hdl.handle.net/10220/39132 10.1109/FPL.2014.6927467 en © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/FPL.2014.6927467]. 7 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Computer Science and Engineering
spellingShingle	Computer Science and Engineering Soh, Jun Jie Kapre, Nachiket Comparing soft and hard vector processing in FPGA-based embedded systems
description	Soft vector processors can augment and extend the capability of embedded hard vector processors in FPGA-based SoCs such as the Xilinx Zynq. We develop a compiler framework and an auto-tuning runtime that optimizes and executes data-parallel computation either on the scalar ARM processor, the embedded NEON engine or the Vectorblox MXP soft vector processor as appropriate. We consider computational conditions such as precision, vector length, chunk size, IO requirements under which soft vector processing can outperform scalar cores and hard vector blocks. Across a range of data-parallel benchmarks, we show that the MXP soft vector processor can outperform the NEON engine by up to 3.95× while saving 9% dynamic power (0.1W absolute). Our compilation and runtime framework is also able to outperform the gcc NEON vectorizer under certain conditions by explicit generation of NEON intrinsics and performance tuning of the auto-generated data-parallel code.
author2	School of Computer Engineering
author_facet	School of Computer Engineering Soh, Jun Jie Kapre, Nachiket
format	Conference or Workshop Item
author	Soh, Jun Jie Kapre, Nachiket
author_sort	Soh, Jun Jie
title	Comparing soft and hard vector processing in FPGA-based embedded systems
title_short	Comparing soft and hard vector processing in FPGA-based embedded systems
title_full	Comparing soft and hard vector processing in FPGA-based embedded systems
title_fullStr	Comparing soft and hard vector processing in FPGA-based embedded systems
title_full_unstemmed	Comparing soft and hard vector processing in FPGA-based embedded systems
title_sort	comparing soft and hard vector processing in fpga-based embedded systems
publishDate	2015
url	https://hdl.handle.net/10356/81218 http://hdl.handle.net/10220/39132
_version_	1681059422954061824

Comparing soft and hard vector processing in FPGA-based embedded systems

Similar Items