Application composition and communication optimization in iterative solvers using FPGAs

We consider the problem of minimizing communication with off-chip memory and composition of multiple linear algebra kernels in iterative solvers for solving large-scale eigenvalue problems and linear systems of equations. While GPUs may offer higher throughput for individual kernels, overall applica...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Rafique, Abid, Kapre, Nachiket, Constantinides, George A.
مؤلفون آخرون:	School of Computer Engineering
التنسيق:	Conference or Workshop Item
اللغة:	English
منشور في:	2013
الموضوعات:	DRNTU::Engineering::Computer science and engineering::Computing methodologies
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/98626 http://hdl.handle.net/10220/17397
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-98626
record_format	dspace
spelling	sg-ntu-dr.10356-986262020-05-28T07:17:17Z Application composition and communication optimization in iterative solvers using FPGAs Rafique, Abid Kapre, Nachiket Constantinides, George A. School of Computer Engineering IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (21st : 2013 : Seattle, Washington, US) DRNTU::Engineering::Computer science and engineering::Computing methodologies We consider the problem of minimizing communication with off-chip memory and composition of multiple linear algebra kernels in iterative solvers for solving large-scale eigenvalue problems and linear systems of equations. While GPUs may offer higher throughput for individual kernels, overall application performance is limited by the inability to support on-chip sharing of data across kernels. In this paper, we show that higher on-chip memory capacity and superior on-chip communication bandwidth enables FPGAs to better support the composition of a sequence of kernels within these iterative solvers. We present a time-multiplexed FPGA architecture which exploits the on-chip capacity to store dependencies between kernels and high communication bandwidth to move data. We propose a resource-constrained framework to select the optimal value of an algorithmic parameter which provides the tradeoff between communication and computation cost for a particular FPGA. Using the Lanczos Method as a case study, we show how to minimize communication on FPGAs by this tight algorithm-architecture interaction and get superior performance over GPU despite of its ~5x larger off-chip memory bandwidth and ~2x greater peak singleprecision floating-point performance. Accepted version 2013-11-07T07:31:03Z 2019-12-06T19:57:54Z 2013-11-07T07:31:03Z 2019-12-06T19:57:54Z 2013 2013 Conference Paper Rafique, A., Kapre, N., & Constantinides, G. A. (2013). Application Composition and Communication Optimization in Iterative Solvers Using FPGAs. 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, pp.153-160. https://hdl.handle.net/10356/98626 http://hdl.handle.net/10220/17397 10.1109/FCCM.2013.16 en © 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/FCCM.2013.16] application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computing methodologies
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computing methodologies Rafique, Abid Kapre, Nachiket Constantinides, George A. Application composition and communication optimization in iterative solvers using FPGAs
description	We consider the problem of minimizing communication with off-chip memory and composition of multiple linear algebra kernels in iterative solvers for solving large-scale eigenvalue problems and linear systems of equations. While GPUs may offer higher throughput for individual kernels, overall application performance is limited by the inability to support on-chip sharing of data across kernels. In this paper, we show that higher on-chip memory capacity and superior on-chip communication bandwidth enables FPGAs to better support the composition of a sequence of kernels within these iterative solvers. We present a time-multiplexed FPGA architecture which exploits the on-chip capacity to store dependencies between kernels and high communication bandwidth to move data. We propose a resource-constrained framework to select the optimal value of an algorithmic parameter which provides the tradeoff between communication and computation cost for a particular FPGA. Using the Lanczos Method as a case study, we show how to minimize communication on FPGAs by this tight algorithm-architecture interaction and get superior performance over GPU despite of its ~5x larger off-chip memory bandwidth and ~2x greater peak singleprecision floating-point performance.
author2	School of Computer Engineering
author_facet	School of Computer Engineering Rafique, Abid Kapre, Nachiket Constantinides, George A.
format	Conference or Workshop Item
author	Rafique, Abid Kapre, Nachiket Constantinides, George A.
author_sort	Rafique, Abid
title	Application composition and communication optimization in iterative solvers using FPGAs
title_short	Application composition and communication optimization in iterative solvers using FPGAs
title_full	Application composition and communication optimization in iterative solvers using FPGAs
title_fullStr	Application composition and communication optimization in iterative solvers using FPGAs
title_full_unstemmed	Application composition and communication optimization in iterative solvers using FPGAs
title_sort	application composition and communication optimization in iterative solvers using fpgas
publishDate	2013
url	https://hdl.handle.net/10356/98626 http://hdl.handle.net/10220/17397
_version_	1681056053278539776

Application composition and communication optimization in iterative solvers using FPGAs

مواد مشابهة