Application composition and communication optimization in iterative solvers using FPGAs

We consider the problem of minimizing communication with off-chip memory and composition of multiple linear algebra kernels in iterative solvers for solving large-scale eigenvalue problems and linear systems of equations. While GPUs may offer higher throughput for individual kernels, overall applica...

全面介紹

Saved in:

書目詳細資料
Main Authors:	Rafique, Abid, Kapre, Nachiket, Constantinides, George A.
其他作者:	School of Computer Engineering
格式:	Conference or Workshop Item
語言:	English
出版:	2013
主題:	DRNTU::Engineering::Computer science and engineering::Computing methodologies
在線閱讀:	https://hdl.handle.net/10356/98626 http://hdl.handle.net/10220/17397
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

實物特徵
總結:	We consider the problem of minimizing communication with off-chip memory and composition of multiple linear algebra kernels in iterative solvers for solving large-scale eigenvalue problems and linear systems of equations. While GPUs may offer higher throughput for individual kernels, overall application performance is limited by the inability to support on-chip sharing of data across kernels. In this paper, we show that higher on-chip memory capacity and superior on-chip communication bandwidth enables FPGAs to better support the composition of a sequence of kernels within these iterative solvers. We present a time-multiplexed FPGA architecture which exploits the on-chip capacity to store dependencies between kernels and high communication bandwidth to move data. We propose a resource-constrained framework to select the optimal value of an algorithmic parameter which provides the tradeoff between communication and computation cost for a particular FPGA. Using the Lanczos Method as a case study, we show how to minimize communication on FPGAs by this tight algorithm-architecture interaction and get superior performance over GPU despite of its ~5x larger off-chip memory bandwidth and ~2x greater peak singleprecision floating-point performance.

Application composition and communication optimization in iterative solvers using FPGAs

相似書籍