Enhancing FPGA-based RDMA-enabled SmartNICs for scale-out computing

To address the burgeoning data growth and expanding workloads/applications, modern data centers are equipped with thousands of network-connected hosts, each featuring CPUs and accelerators like ASICs, FPGAs and GPUs. The high-speed networking requirements have driven the emergence of SmartNIC techno...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Kolekar Aditya Dilip
مؤلفون آخرون: Kim Tae Hyoung
التنسيق: Thesis-Master by Coursework
اللغة:English
منشور في: Nanyang Technological University 2024
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/173997
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Nanyang Technological University
اللغة: English
الوصف
الملخص:To address the burgeoning data growth and expanding workloads/applications, modern data centers are equipped with thousands of network-connected hosts, each featuring CPUs and accelerators like ASICs, FPGAs and GPUs. The high-speed networking requirements have driven the emergence of SmartNIC technologies and Remote Direct Memory Access (RDMA). This thesis focuses on FPGA-based RDMA-enabled SmartNICs, more specifically, enhancing features in RecoNIC, an open-sourced adaptive SmartNIC platform from AMD. RecoNIC provides a platform to implement FPGA accelerators while allowing those accelerators to initiate RDMA read/write requests for communication through the host. Relying on CPUs for RDMA control operations typically results in higher read/write latency, especially when transmitting small messages. This work extends RecoNIC to share its RDMA offloading engine with host CPU and FPGA accelerators, as well as adding FPGA-side DRAM support. Moreover, instead of relying on host CPUs to control the RDMA engine, control operations are offloaded onto FPGA accelerators via HLS to significantly reduce RDMA latency, which is crucial for high performance computing. The experiments show significant improvements in the performance of RDMA read/write operations, particularly notable for small payload sizes. For RDMA read operations, near line-rate throughput is achieved at a 4KB payload size with control offload, a considerable improvement compared to the 16KB payload size required without control offload. The latency of read and write operations is reduced by almost 22% and 29%, respectively, demonstrating the tangible benefits of offloading RDMA control operations. Last but not least, most of the work conducted in this thesis has been contributed to the RecoNIC open-source project [25].