Enhancing FPGA-based RDMA-enabled SmartNICs for scale-out computing

To address the burgeoning data growth and expanding workloads/applications, modern data centers are equipped with thousands of network-connected hosts, each featuring CPUs and accelerators like ASICs, FPGAs and GPUs. The high-speed networking requirements have driven the emergence of SmartNIC techno...

Full description

Saved in:
Bibliographic Details
Main Author: Kolekar Aditya Dilip
Other Authors: Kim Tae Hyoung
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/173997
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:To address the burgeoning data growth and expanding workloads/applications, modern data centers are equipped with thousands of network-connected hosts, each featuring CPUs and accelerators like ASICs, FPGAs and GPUs. The high-speed networking requirements have driven the emergence of SmartNIC technologies and Remote Direct Memory Access (RDMA). This thesis focuses on FPGA-based RDMA-enabled SmartNICs, more specifically, enhancing features in RecoNIC, an open-sourced adaptive SmartNIC platform from AMD. RecoNIC provides a platform to implement FPGA accelerators while allowing those accelerators to initiate RDMA read/write requests for communication through the host. Relying on CPUs for RDMA control operations typically results in higher read/write latency, especially when transmitting small messages. This work extends RecoNIC to share its RDMA offloading engine with host CPU and FPGA accelerators, as well as adding FPGA-side DRAM support. Moreover, instead of relying on host CPUs to control the RDMA engine, control operations are offloaded onto FPGA accelerators via HLS to significantly reduce RDMA latency, which is crucial for high performance computing. The experiments show significant improvements in the performance of RDMA read/write operations, particularly notable for small payload sizes. For RDMA read operations, near line-rate throughput is achieved at a 4KB payload size with control offload, a considerable improvement compared to the 16KB payload size required without control offload. The latency of read and write operations is reduced by almost 22% and 29%, respectively, demonstrating the tangible benefits of offloading RDMA control operations. Last but not least, most of the work conducted in this thesis has been contributed to the RecoNIC open-source project [25].