Graph MMU design for Zedwulf

We previously assembled a miniature Beowulf cluster using 32 ZedBoards, the Zedwulf. Installed with Xillybus 1.3 OS on the Processing System, inter-nodal communication are possible using Message Passing Interface (MPI). To implement a graph machine core leveraging the high bandwidth and low la...

Full description

Saved in:

Bibliographic Details
Main Author:	Han, Jianglei
Other Authors:	Nachiket Kapre
Format:	Final Year Project
Language:	English
Published:	2014
Subjects:	DRNTU::Engineering::Computer science and engineering::Hardware::Logic design DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems DRNTU::Engineering::Computer science and engineering::Hardware::Input/output and data communications
Online Access:	http://hdl.handle.net/10356/61961
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-61961
record_format	dspace
spelling	sg-ntu-dr.10356-619612023-03-03T20:36:54Z Graph MMU design for Zedwulf Han, Jianglei Nachiket Kapre School of Computer Engineering Centre for High Performance Embedded Systems DRNTU::Engineering::Computer science and engineering::Hardware::Logic design DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems DRNTU::Engineering::Computer science and engineering::Hardware::Input/output and data communications We previously assembled a miniature Beowulf cluster using 32 ZedBoards, the Zedwulf. Installed with Xillybus 1.3 OS on the Processing System, inter-nodal communication are possible using Message Passing Interface (MPI). To implement a graph machine core leveraging the high bandwidth and low latency of on-chip memory on Programmable Logic, this project explores the possibility of an autonomous AXI DMA-based graph memory management unit (Graph MMU) using for data transfer between the PS and PL. The Graph MMU receives and stores the memory base address and burst length from the CPU in internal registers. It re-constructs the control signals target to AXI DMA core and sends the latched data to relevant register addresses as the Processing System would do to to control the DMA core directly. In simulation, the Graph MMU observes 19.5 cycles of latency. We also benchmark the DMA core with different Max Burst Size hardware setting, observed 4 times speedup by increasing the Max Burst Size from 2 to 16. Both register mode and scatter gather mode DMA configurations are tested with respect to sparse graph-like memory access pattern. Scatter gather mode DMA outperform the register mode DMA in the uniform 128Bytes single burst test by 3 times faster Bachelor of Engineering (Computer Engineering) 2014-12-12T04:26:06Z 2014-12-12T04:26:06Z 2014 2014 Final Year Project (FYP) http://hdl.handle.net/10356/61961 en Nanyang Technological University 42 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Hardware::Logic design DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems DRNTU::Engineering::Computer science and engineering::Hardware::Input/output and data communications
spellingShingle	DRNTU::Engineering::Computer science and engineering::Hardware::Logic design DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems DRNTU::Engineering::Computer science and engineering::Hardware::Input/output and data communications Han, Jianglei Graph MMU design for Zedwulf
description	We previously assembled a miniature Beowulf cluster using 32 ZedBoards, the Zedwulf. Installed with Xillybus 1.3 OS on the Processing System, inter-nodal communication are possible using Message Passing Interface (MPI). To implement a graph machine core leveraging the high bandwidth and low latency of on-chip memory on Programmable Logic, this project explores the possibility of an autonomous AXI DMA-based graph memory management unit (Graph MMU) using for data transfer between the PS and PL. The Graph MMU receives and stores the memory base address and burst length from the CPU in internal registers. It re-constructs the control signals target to AXI DMA core and sends the latched data to relevant register addresses as the Processing System would do to to control the DMA core directly. In simulation, the Graph MMU observes 19.5 cycles of latency. We also benchmark the DMA core with different Max Burst Size hardware setting, observed 4 times speedup by increasing the Max Burst Size from 2 to 16. Both register mode and scatter gather mode DMA configurations are tested with respect to sparse graph-like memory access pattern. Scatter gather mode DMA outperform the register mode DMA in the uniform 128Bytes single burst test by 3 times faster
author2	Nachiket Kapre
author_facet	Nachiket Kapre Han, Jianglei
format	Final Year Project
author	Han, Jianglei
author_sort	Han, Jianglei
title	Graph MMU design for Zedwulf
title_short	Graph MMU design for Zedwulf
title_full	Graph MMU design for Zedwulf
title_fullStr	Graph MMU design for Zedwulf
title_full_unstemmed	Graph MMU design for Zedwulf
title_sort	graph mmu design for zedwulf
publishDate	2014
url	http://hdl.handle.net/10356/61961
_version_	1759854356163723264

Graph MMU design for Zedwulf

Similar Items