Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization

Performance of FPGA-based token dataflow architectures is often limited by the long tail distribution of parallelism in the compute paths of the dataflow graphs. This is known to limit speedup of dataflow processing of Sparse LU factorization to only 3-10x over CPUs. One reason behind the limitation...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Siddhartha, Kapre, Nachiket
مؤلفون آخرون:	School of Computer Engineering
التنسيق:	Conference or Workshop Item
اللغة:	English
منشور في:	2015
الموضوعات:	Computer Science and Engineering
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/81207 http://hdl.handle.net/10220/39179
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-81207
record_format	dspace
spelling	sg-ntu-dr.10356-812072020-05-28T07:17:42Z Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization Siddhartha Kapre, Nachiket School of Computer Engineering 2014 International Conference on Field-Programmable Technology (FPT) Computer Science and Engineering Performance of FPGA-based token dataflow architectures is often limited by the long tail distribution of parallelism in the compute paths of the dataflow graphs. This is known to limit speedup of dataflow processing of Sparse LU factorization to only 3-10x over CPUs. One reason behind the limitations is the serialization penalty of processing high-fanout nodes in the dataflow graph on traditional dataflow processing architectures. In this paper, we show how to perform one-time static fanout decomposition and selective node replication transformations to input dataflow graphs. These transformations are one-time static compute costs that are typically amortized over millions of iterations. For dataflow graphs extracted for sparse LU factorization, we demonstrate up to 2.3x speedup (1.2x geomean average) with this technique across a range of benchmark problems. Accepted version 2015-12-18T08:50:34Z 2019-12-06T14:23:38Z 2015-12-18T08:50:34Z 2019-12-06T14:23:38Z 2014 Conference Paper Siddhartha,, & Kapre, N. (2014). Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization. 2014 International Conference on Field-Programmable Technology (FPT), 252-255. https://hdl.handle.net/10356/81207 http://hdl.handle.net/10220/39179 10.1109/FPT.2014.7082787 en © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/FPT.2014.7082787]. 8 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Computer Science and Engineering
spellingShingle	Computer Science and Engineering Siddhartha Kapre, Nachiket Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization
description	Performance of FPGA-based token dataflow architectures is often limited by the long tail distribution of parallelism in the compute paths of the dataflow graphs. This is known to limit speedup of dataflow processing of Sparse LU factorization to only 3-10x over CPUs. One reason behind the limitations is the serialization penalty of processing high-fanout nodes in the dataflow graph on traditional dataflow processing architectures. In this paper, we show how to perform one-time static fanout decomposition and selective node replication transformations to input dataflow graphs. These transformations are one-time static compute costs that are typically amortized over millions of iterations. For dataflow graphs extracted for sparse LU factorization, we demonstrate up to 2.3x speedup (1.2x geomean average) with this technique across a range of benchmark problems.
author2	School of Computer Engineering
author_facet	School of Computer Engineering Siddhartha Kapre, Nachiket
format	Conference or Workshop Item
author	Siddhartha Kapre, Nachiket
author_sort	Siddhartha
title	Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization
title_short	Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization
title_full	Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization
title_fullStr	Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization
title_full_unstemmed	Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization
title_sort	fanout decomposition dataflow optimizations for fpga-based sparse lu factorization
publishDate	2015
url	https://hdl.handle.net/10356/81207 http://hdl.handle.net/10220/39179
_version_	1681059020201263104

Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization

مواد مشابهة