On Data Forwarding in Deeply Pipelined Soft Processors
We can design high-frequency soft-processors on FPGAs that exploit deep pipelining of DSP primitives, supported by selective data forwarding, to deliver up to 25% performance improvements across a range of benchmarks. Pipelined, in-order, scalar processors can be small and lightweight but suffer fro...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/81226 http://hdl.handle.net/10220/39195 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-81226 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-812262020-05-28T07:18:21Z On Data Forwarding in Deeply Pipelined Soft Processors Kapre, Nachiket Cheah, Hui Yan Fahmy, Suhaib A. School of Computer Engineering Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '15 Field programmable gate arrays; soft processors; digital signal processing We can design high-frequency soft-processors on FPGAs that exploit deep pipelining of DSP primitives, supported by selective data forwarding, to deliver up to 25% performance improvements across a range of benchmarks. Pipelined, in-order, scalar processors can be small and lightweight but suffer from a large number of idle cycles due to dependency chains in the instruction sequence. Data forwarding allows us to more deeply pipeline the processor stages while avoiding an associated increase in the NOP cycles between dependent instructions. Full forwarding can be prohibitively complex for a lean soft processor, so we explore two approaches: an external forwarding path around the DSP block execution unit in FPGA logic and using the intrinsic loopback path within the DSP block primitive. We show that internal loopback improves performance by 5% compared to external forwarding, and up to 25% over no data forwarding. The result is a processor that runs at a frequency close to the fabric limit of 500 MHz, but without the significant dependency overheads typical of such processors. Accepted version 2015-12-21T07:39:40Z 2019-12-06T14:25:58Z 2015-12-21T07:39:40Z 2019-12-06T14:25:58Z 2015 Conference Paper Cheah, H. Y., Fahmy, S. A., & Kapre, N. (2015). On Data Forwarding in Deeply Pipelined Soft Processors. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '15, 181-189. https://hdl.handle.net/10356/81226 http://hdl.handle.net/10220/39195 10.1145/2684746.2689067 en © 2015 Association for Computing Machinery (ACM). This is the author created version of a work that has been peer reviewed and accepted for publication by Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Association for Computing Machinery (ACM). It incorporates referee’s comments but changes resulting from the publishing process, such as copyediting, structural formatting, may not be reflected in this document. The published version is available at: [http://dx.doi.org/10.1145/2684746.2689067]. 9 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Field programmable gate arrays; soft processors; digital signal processing |
spellingShingle |
Field programmable gate arrays; soft processors; digital signal processing Kapre, Nachiket Cheah, Hui Yan Fahmy, Suhaib A. On Data Forwarding in Deeply Pipelined Soft Processors |
description |
We can design high-frequency soft-processors on FPGAs that exploit deep pipelining of DSP primitives, supported by selective data forwarding, to deliver up to 25% performance improvements across a range of benchmarks. Pipelined, in-order, scalar processors can be small and lightweight but suffer from a large number of idle cycles due to dependency chains in the instruction sequence. Data forwarding allows us to more deeply pipeline the processor stages while avoiding an associated increase in the NOP cycles between dependent instructions. Full forwarding can be prohibitively complex for a lean soft processor, so we explore two approaches: an external forwarding path around the DSP block execution unit in FPGA logic and using the intrinsic loopback path within the DSP block primitive. We show that internal loopback improves performance by 5% compared to external forwarding, and up to 25% over no data forwarding. The result is a processor that runs at a frequency close to the fabric limit of 500 MHz, but without the significant dependency overheads typical of such processors. |
author2 |
School of Computer Engineering |
author_facet |
School of Computer Engineering Kapre, Nachiket Cheah, Hui Yan Fahmy, Suhaib A. |
format |
Conference or Workshop Item |
author |
Kapre, Nachiket Cheah, Hui Yan Fahmy, Suhaib A. |
author_sort |
Kapre, Nachiket |
title |
On Data Forwarding in Deeply Pipelined Soft Processors |
title_short |
On Data Forwarding in Deeply Pipelined Soft Processors |
title_full |
On Data Forwarding in Deeply Pipelined Soft Processors |
title_fullStr |
On Data Forwarding in Deeply Pipelined Soft Processors |
title_full_unstemmed |
On Data Forwarding in Deeply Pipelined Soft Processors |
title_sort |
on data forwarding in deeply pipelined soft processors |
publishDate |
2015 |
url |
https://hdl.handle.net/10356/81226 http://hdl.handle.net/10220/39195 |
_version_ |
1681056829090562048 |