GraphH: High performance big graph analytics in small clusters

It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resources to handle big graphs. While several out-of-core approaches have been proposed for processing big graphs on disk, the high disk...

Full description

Saved in:
Bibliographic Details
Main Authors: SUN, Peng, WEN, Yonggang, TA, Nguyen Binh Duong, XIAO, Xiaokui
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4765
https://ink.library.smu.edu.sg/context/sis_research/article/5768/viewcontent/1705.05595.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5768
record_format dspace
spelling sg-smu-ink.sis_research-57682020-01-16T10:27:26Z GraphH: High performance big graph analytics in small clusters SUN, Peng WEN, Yonggang TA, Nguyen Binh Duong XIAO, Xiaokui It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resources to handle big graphs. While several out-of-core approaches have been proposed for processing big graphs on disk, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable highperformance big graph analytics in small clusters. Specifically, we design a two-stage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (GatherApply-Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single commodity server. Extensive evaluations have shown that GraphH could be up to 7.8x faster compared to popular in-memory systems, such as Pregel+ and PowerGraph when processing generic graphs, and more than 100x faster than recently proposed out-of-core systems, such as GraphD and Chaos when processing big graphs. 2017-09-08T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4765 info:doi/10.1109/CLUSTER.2017.51 https://ink.library.smu.edu.sg/context/sis_research/article/5768/viewcontent/1705.05595.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Graph Processing Distributed Computing System Network Numerical Analysis and Scientific Computing Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Graph Processing
Distributed Computing System
Network
Numerical Analysis and Scientific Computing
Software Engineering
spellingShingle Graph Processing
Distributed Computing System
Network
Numerical Analysis and Scientific Computing
Software Engineering
SUN, Peng
WEN, Yonggang
TA, Nguyen Binh Duong
XIAO, Xiaokui
GraphH: High performance big graph analytics in small clusters
description It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resources to handle big graphs. While several out-of-core approaches have been proposed for processing big graphs on disk, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable highperformance big graph analytics in small clusters. Specifically, we design a two-stage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (GatherApply-Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single commodity server. Extensive evaluations have shown that GraphH could be up to 7.8x faster compared to popular in-memory systems, such as Pregel+ and PowerGraph when processing generic graphs, and more than 100x faster than recently proposed out-of-core systems, such as GraphD and Chaos when processing big graphs.
format text
author SUN, Peng
WEN, Yonggang
TA, Nguyen Binh Duong
XIAO, Xiaokui
author_facet SUN, Peng
WEN, Yonggang
TA, Nguyen Binh Duong
XIAO, Xiaokui
author_sort SUN, Peng
title GraphH: High performance big graph analytics in small clusters
title_short GraphH: High performance big graph analytics in small clusters
title_full GraphH: High performance big graph analytics in small clusters
title_fullStr GraphH: High performance big graph analytics in small clusters
title_full_unstemmed GraphH: High performance big graph analytics in small clusters
title_sort graphh: high performance big graph analytics in small clusters
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/4765
https://ink.library.smu.edu.sg/context/sis_research/article/5768/viewcontent/1705.05595.pdf
_version_ 1770575025056776192