Heterogeneous dataflow architectures for FPGA-based sparse LU factorization
FPGA-based token dataflow architectures with heterogeneous computation and communication subsystems can accelerate hard-to-parallelize, irregular computations in sparse LU factorization. We combine software pre-processing and architecture customization to fully expose and exploit the underlying hete...
Saved in:
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/81188 http://hdl.handle.net/10220/39175 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-81188 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-811882020-05-28T07:17:51Z Heterogeneous dataflow architectures for FPGA-based sparse LU factorization Siddhartha Kapre, Nachiket School of Computer Engineering 2014 24th International Conference on Field Programmable Logic and Applications (FPL) Computer Science and Engineering FPGA-based token dataflow architectures with heterogeneous computation and communication subsystems can accelerate hard-to-parallelize, irregular computations in sparse LU factorization. We combine software pre-processing and architecture customization to fully expose and exploit the underlying heterogeneity in the factorization algorithm. We perform a one-time pre-processing of the sparse matrices in software to generate dataflow graphs that capture raw parallelism in the computation through substitution and reassociation transformations. We customize the dataflow architecture by picking the right mixture of addition and multiplication processing elements to match the observed balance in the dataflow graphs. Additionally, we modify the network-on-chip to route certain critical dependencies on a separate, faster communication channel while relegating less-critical traffic to the existing channels. Using our techniques, we show how to achieve speedups of up to 37% over existing state-of-the-art FPGA-based sparse LU factorization systems that can already run 3-4× faster than CPU-based sparse LU solvers using the same hardware constraints. Accepted version 2015-12-18T08:29:58Z 2019-12-06T14:23:14Z 2015-12-18T08:29:58Z 2019-12-06T14:23:14Z 2014 Conference Paper Siddhartha., & Kapre, N. (2014). Heterogeneous dataflow architectures for FPGA-based sparse LU factorization. 2014 24th International Conference on Field Programmable Logic and Applications (FPL), 1-4. https://hdl.handle.net/10356/81188 http://hdl.handle.net/10220/39175 10.1109/FPL.2014.6927401 en © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/FPL.2014.6927401]. 4 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Computer Science and Engineering |
spellingShingle |
Computer Science and Engineering Siddhartha Kapre, Nachiket Heterogeneous dataflow architectures for FPGA-based sparse LU factorization |
description |
FPGA-based token dataflow architectures with heterogeneous computation and communication subsystems can accelerate hard-to-parallelize, irregular computations in sparse LU factorization. We combine software pre-processing and architecture customization to fully expose and exploit the underlying heterogeneity in the factorization algorithm. We perform a one-time pre-processing of the sparse matrices in software to generate dataflow graphs that capture raw parallelism in the computation through substitution and reassociation transformations. We customize the dataflow architecture by picking the right mixture of addition and multiplication processing elements to match the observed balance in the dataflow graphs. Additionally, we modify the network-on-chip to route certain critical dependencies on a separate, faster communication channel while relegating less-critical traffic to the existing channels. Using our techniques, we show how to achieve speedups of up to 37% over existing state-of-the-art FPGA-based sparse LU factorization systems that can already run 3-4× faster than CPU-based sparse LU solvers using the same hardware constraints. |
author2 |
School of Computer Engineering |
author_facet |
School of Computer Engineering Siddhartha Kapre, Nachiket |
format |
Conference or Workshop Item |
author |
Siddhartha Kapre, Nachiket |
author_sort |
Siddhartha |
title |
Heterogeneous dataflow architectures for FPGA-based sparse LU factorization |
title_short |
Heterogeneous dataflow architectures for FPGA-based sparse LU factorization |
title_full |
Heterogeneous dataflow architectures for FPGA-based sparse LU factorization |
title_fullStr |
Heterogeneous dataflow architectures for FPGA-based sparse LU factorization |
title_full_unstemmed |
Heterogeneous dataflow architectures for FPGA-based sparse LU factorization |
title_sort |
heterogeneous dataflow architectures for fpga-based sparse lu factorization |
publishDate |
2015 |
url |
https://hdl.handle.net/10356/81188 http://hdl.handle.net/10220/39175 |
_version_ |
1681059256617402368 |