Efficient FPGA-based sparse matrix-vector multiplication with data reuse-aware compression

Sparse matrix-vector multiplication (SpMV) on FPGAs has gained much attention. The performance of SpMV is mainly determined by the number of multiplications between non-zero matrix elements and the corresponding vector values per cycle. On the one side, the off-chip memory bandwidth limits the numbe...

Full description

Saved in:

Bibliographic Details
Main Authors:	Li, Shiqing, Liu, Di, Liu, Weichen
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering Engineering::Computer science and engineering::Hardware SpMV FPGA Data Reuse Throughput
Online Access:	https://hdl.handle.net/10356/169152
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-169152
record_format	dspace
spelling	sg-ntu-dr.10356-1691522023-09-20T08:47:46Z Efficient FPGA-based sparse matrix-vector multiplication with data reuse-aware compression Li, Shiqing Liu, Di Liu, Weichen School of Computer Science and Engineering Engineering::Computer science and engineering Engineering::Computer science and engineering::Hardware SpMV FPGA Data Reuse Throughput Sparse matrix-vector multiplication (SpMV) on FPGAs has gained much attention. The performance of SpMV is mainly determined by the number of multiplications between non-zero matrix elements and the corresponding vector values per cycle. On the one side, the off-chip memory bandwidth limits the number of non-zero matrix elements transferred from the off-chip DDR to the FPGA chip per cycle. On the other side, the irregular vector access pattern poses challenges to fetch the corresponding vector values. Besides, the read-after-write (RAW) dependency in the accumulation process shall be solved to enable a fully pipelined design. In this work, we propose an efficient FPGA-based sparse matrix-vector multiplication accelerator with data reuse-aware compression. The key observation is that repeated accesses to a vector value can be omitted by reusing the fetched data. Based on the observation, we propose a reordering algorithm to manually exploit the data reuse of fetched vector values. Further, we propose a novel compressed format called data reuse-aware compressed (DRC) to take full advantage of the data reuse and a fast format conversion algorithm to shorten the preprocessing time. Meanwhile, we propose an HLSfriendly accumulator to solve the RAW dependency. Finally, we implement and evaluate our proposed design on the Xilinx Zynq-UltraScale ZCU106 platform with a set of sparse matrices from the SuiteSparse matrix collection. Our proposed design achieves an average 1.18x performance speedup without the DRC format and an average 1.57x performance speedup with the DRC format w.r.t. the state-of-the-art work respectively. Ministry of Education (MOE) Nanyang Technological University Submitted/Accepted version This work is partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071), and Nanyang Technological University, Singapore, under its NAP (M4082282/ 04INS000515C130). 2023-07-05T05:09:01Z 2023-07-05T05:09:01Z 2023 Journal Article Li, S., Liu, D. & Liu, W. (2023). Efficient FPGA-based sparse matrix-vector multiplication with data reuse-aware compression. IEEE Transactions On Computer-Aided Design of Integrated Circuits and Systems. https://dx.doi.org/10.1109/TCAD.2023.3281715 0278-0070 https://hdl.handle.net/10356/169152 10.1109/TCAD.2023.3281715 2-s2.0-85161071197 en MOE2019-T2-1-071 NAP (M4082282/04INS000515C130) IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 10.21979/N9/EXZ0Y3 © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/[Article URL/DOI]. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Engineering::Computer science and engineering::Hardware SpMV FPGA Data Reuse Throughput
spellingShingle	Engineering::Computer science and engineering Engineering::Computer science and engineering::Hardware SpMV FPGA Data Reuse Throughput Li, Shiqing Liu, Di Liu, Weichen Efficient FPGA-based sparse matrix-vector multiplication with data reuse-aware compression
description	Sparse matrix-vector multiplication (SpMV) on FPGAs has gained much attention. The performance of SpMV is mainly determined by the number of multiplications between non-zero matrix elements and the corresponding vector values per cycle. On the one side, the off-chip memory bandwidth limits the number of non-zero matrix elements transferred from the off-chip DDR to the FPGA chip per cycle. On the other side, the irregular vector access pattern poses challenges to fetch the corresponding vector values. Besides, the read-after-write (RAW) dependency in the accumulation process shall be solved to enable a fully pipelined design. In this work, we propose an efficient FPGA-based sparse matrix-vector multiplication accelerator with data reuse-aware compression. The key observation is that repeated accesses to a vector value can be omitted by reusing the fetched data. Based on the observation, we propose a reordering algorithm to manually exploit the data reuse of fetched vector values. Further, we propose a novel compressed format called data reuse-aware compressed (DRC) to take full advantage of the data reuse and a fast format conversion algorithm to shorten the preprocessing time. Meanwhile, we propose an HLSfriendly accumulator to solve the RAW dependency. Finally, we implement and evaluate our proposed design on the Xilinx Zynq-UltraScale ZCU106 platform with a set of sparse matrices from the SuiteSparse matrix collection. Our proposed design achieves an average 1.18x performance speedup without the DRC format and an average 1.57x performance speedup with the DRC format w.r.t. the state-of-the-art work respectively.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Li, Shiqing Liu, Di Liu, Weichen
format	Article
author	Li, Shiqing Liu, Di Liu, Weichen
author_sort	Li, Shiqing
title	Efficient FPGA-based sparse matrix-vector multiplication with data reuse-aware compression
title_short	Efficient FPGA-based sparse matrix-vector multiplication with data reuse-aware compression
title_full	Efficient FPGA-based sparse matrix-vector multiplication with data reuse-aware compression
title_fullStr	Efficient FPGA-based sparse matrix-vector multiplication with data reuse-aware compression
title_full_unstemmed	Efficient FPGA-based sparse matrix-vector multiplication with data reuse-aware compression
title_sort	efficient fpga-based sparse matrix-vector multiplication with data reuse-aware compression
publishDate	2023
url	https://hdl.handle.net/10356/169152
_version_	1779156376190189568

Efficient FPGA-based sparse matrix-vector multiplication with data reuse-aware compression

Similar Items