Enhancing FPGA-based RDMA-enabled SmartNICs for scale-out computing
To address the burgeoning data growth and expanding workloads/applications, modern data centers are equipped with thousands of network-connected hosts, each featuring CPUs and accelerators like ASICs, FPGAs and GPUs. The high-speed networking requirements have driven the emergence of SmartNIC techno...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/173997 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | To address the burgeoning data growth and expanding workloads/applications, modern data centers are equipped with thousands of network-connected hosts, each featuring CPUs and accelerators like ASICs, FPGAs and GPUs. The high-speed networking requirements have driven the emergence of SmartNIC technologies and Remote Direct Memory Access (RDMA). This thesis focuses on FPGA-based RDMA-enabled SmartNICs, more specifically, enhancing features in RecoNIC, an open-sourced adaptive SmartNIC platform from AMD.
RecoNIC provides a platform to implement FPGA accelerators while allowing those accelerators to initiate RDMA read/write requests for communication through the host. Relying on CPUs for RDMA control operations typically results in higher read/write latency, especially when transmitting small messages. This work extends RecoNIC to share its RDMA offloading engine with host CPU and FPGA accelerators, as well as adding FPGA-side DRAM support. Moreover, instead of relying on host CPUs to control the RDMA engine, control operations are offloaded onto FPGA accelerators via HLS to significantly reduce RDMA latency, which is crucial for high performance computing.
The experiments show significant improvements in the performance of RDMA read/write operations, particularly notable for small payload sizes. For RDMA read operations, near line-rate throughput is achieved at a 4KB payload size with control offload, a considerable improvement compared to the 16KB payload size required without control offload. The latency of read and write operations is reduced by almost 22% and 29%, respectively, demonstrating the tangible benefits of offloading RDMA control operations. Last but not least, most of the work conducted in this thesis has been contributed to the RecoNIC open-source project [25]. |
---|