IMPLEMENTATION OF PARALLEL LOUVAIN ALGORITHM USING MULTI-GPU

The Louvain algorithm is a method to detect communities present in a graph. Among parallel implementations of the algorithm, GPU implementations show a significant increase in performance but are still bound by the throughput and memory limit of one GPU. Along with the increase in the size of gra...

Full description

Saved in:
Bibliographic Details
Main Author: Bezalel Iman Gustaman, Steve
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/65816
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:The Louvain algorithm is a method to detect communities present in a graph. Among parallel implementations of the algorithm, GPU implementations show a significant increase in performance but are still bound by the throughput and memory limit of one GPU. Along with the increase in the size of graphs present in the real world, a method is needed to process the Louvain algorithm for larger graphs without decreasing the performance. This can be achieved by utilizing multiGPU. This final project aims to implement the parallel Louvain algorithm in a multi-GPU environment. The multi-GPU implementation is adapted from a single-GPU implementation from previous research by Naim in 2017. In the multi-GPU implementation designed, each GPU is responsible to process one partition of the input graph. There are also data synchronizations among the partitions using inter-GPU communications. This implementation utilizes NVIDIA CUDA for GPU programming and NCCL library for inter-GPU communication APIs. Compared with the single-GPU implementation, the multi-GPU implementation designed needs additional memory. The percentage of additional overhead memory needed is lower in graphs with a higher edge-to-node ratio. Other than that, with the usage of 8 GPUs, the multi-GPU algorithm designed gives off up to 1.2x speedup in processing graphs with low-degree largest-degree-node, and up to 3.4x speedup in processing graphs with varying node degrees and a high-degree largestdegree-node.