Graph MMU design for Zedwulf

We previously assembled a miniature Beowulf cluster using 32 ZedBoards, the Zedwulf. Installed with Xillybus 1.3 OS on the Processing System, inter-nodal communication are possible using Message Passing Interface (MPI). To implement a graph machine core leveraging the high bandwidth and low la...

全面介紹

Saved in:
書目詳細資料
主要作者: Han, Jianglei
其他作者: Nachiket Kapre
格式: Final Year Project
語言:English
出版: 2014
主題:
在線閱讀:http://hdl.handle.net/10356/61961
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:We previously assembled a miniature Beowulf cluster using 32 ZedBoards, the Zedwulf. Installed with Xillybus 1.3 OS on the Processing System, inter-nodal communication are possible using Message Passing Interface (MPI). To implement a graph machine core leveraging the high bandwidth and low latency of on-chip memory on Programmable Logic, this project explores the possibility of an autonomous AXI DMA-based graph memory management unit (Graph MMU) using for data transfer between the PS and PL. The Graph MMU receives and stores the memory base address and burst length from the CPU in internal registers. It re-constructs the control signals target to AXI DMA core and sends the latched data to relevant register addresses as the Processing System would do to to control the DMA core directly. In simulation, the Graph MMU observes 19.5 cycles of latency. We also benchmark the DMA core with different Max Burst Size hardware setting, observed 4 times speedup by increasing the Max Burst Size from 2 to 16. Both register mode and scatter gather mode DMA configurations are tested with respect to sparse graph-like memory access pattern. Scatter gather mode DMA outperform the register mode DMA in the uniform 128Bytes single burst test by 3 times faster