GraphH: High performance big graph analytics in small clusters

It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resources to handle big graphs. While several out-of-core approaches have been proposed for processing big graphs on disk, the high disk...

Full description

Saved in:

Bibliographic Details
Main Authors:	SUN, Peng, WEN, Yonggang, TA, Nguyen Binh Duong, XIAO, Xiaokui
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2017
Subjects:	Graph Processing Distributed Computing System Network Numerical Analysis and Scientific Computing Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/4765 https://ink.library.smu.edu.sg/context/sis_research/article/5768/viewcontent/1705.05595.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5768
record_format	dspace
spelling	sg-smu-ink.sis_research-57682020-01-16T10:27:26Z GraphH: High performance big graph analytics in small clusters SUN, Peng WEN, Yonggang TA, Nguyen Binh Duong XIAO, Xiaokui It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resources to handle big graphs. While several out-of-core approaches have been proposed for processing big graphs on disk, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable highperformance big graph analytics in small clusters. Specifically, we design a two-stage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (GatherApply-Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single commodity server. Extensive evaluations have shown that GraphH could be up to 7.8x faster compared to popular in-memory systems, such as Pregel+ and PowerGraph when processing generic graphs, and more than 100x faster than recently proposed out-of-core systems, such as GraphD and Chaos when processing big graphs. 2017-09-08T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4765 info:doi/10.1109/CLUSTER.2017.51 https://ink.library.smu.edu.sg/context/sis_research/article/5768/viewcontent/1705.05595.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Graph Processing Distributed Computing System Network Numerical Analysis and Scientific Computing Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Graph Processing Distributed Computing System Network Numerical Analysis and Scientific Computing Software Engineering
spellingShingle	Graph Processing Distributed Computing System Network Numerical Analysis and Scientific Computing Software Engineering SUN, Peng WEN, Yonggang TA, Nguyen Binh Duong XIAO, Xiaokui GraphH: High performance big graph analytics in small clusters
description	It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resources to handle big graphs. While several out-of-core approaches have been proposed for processing big graphs on disk, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable highperformance big graph analytics in small clusters. Specifically, we design a two-stage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (GatherApply-Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single commodity server. Extensive evaluations have shown that GraphH could be up to 7.8x faster compared to popular in-memory systems, such as Pregel+ and PowerGraph when processing generic graphs, and more than 100x faster than recently proposed out-of-core systems, such as GraphD and Chaos when processing big graphs.
format	text
author	SUN, Peng WEN, Yonggang TA, Nguyen Binh Duong XIAO, Xiaokui
author_facet	SUN, Peng WEN, Yonggang TA, Nguyen Binh Duong XIAO, Xiaokui
author_sort	SUN, Peng
title	GraphH: High performance big graph analytics in small clusters
title_short	GraphH: High performance big graph analytics in small clusters
title_full	GraphH: High performance big graph analytics in small clusters
title_fullStr	GraphH: High performance big graph analytics in small clusters
title_full_unstemmed	GraphH: High performance big graph analytics in small clusters
title_sort	graphh: high performance big graph analytics in small clusters
publisher	Institutional Knowledge at Singapore Management University
publishDate	2017
url	https://ink.library.smu.edu.sg/sis_research/4765 https://ink.library.smu.edu.sg/context/sis_research/article/5768/viewcontent/1705.05595.pdf
_version_	1770575025056776192

GraphH: High performance big graph analytics in small clusters

Similar Items