Optimized GPU algorithms for sparse data problems

Many important problems in science and engineering today deal with sparse data. Examples of sparse data include sparse matrices, where the number of nonzero values is much smaller than the total number possible and where nonzeros are located in scattered instead of regular positions, and graphs in w...

Full description

Saved in:

Bibliographic Details
Main Author:	Pham, Nguyen Quang Anh
Other Authors:	Wen Yonggang
Format:	Theses and Dissertations
Language:	English
Published:	2017
Subjects:	DRNTU::Engineering::Computer science and engineering
Online Access:	http://hdl.handle.net/10356/72409
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-72409
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering
spellingShingle	DRNTU::Engineering::Computer science and engineering Pham, Nguyen Quang Anh Optimized GPU algorithms for sparse data problems
description	Many important problems in science and engineering today deal with sparse data. Examples of sparse data include sparse matrices, where the number of nonzero values is much smaller than the total number possible and where nonzeros are located in scattered instead of regular positions, and graphs in which the average degree is low and the edge set has an irregular structure. This type of data frequently arises from real-world sources and is used to model physical, biological or social phenomena. Many sparse datasets are large, and require parallel computing to process efficiently. But sparsity leads to a number of performance challenges. Irregularities in the size and distribution of data leads to load imbalance between threads and scattered memory accesses that strain memory systems optimized for block based accesses. Graphics processing units (GPUs) have been successfully used in recent years to solve many big data problems. GPUs can execute thousands of threads simultaneously, and have much higher throughput and memory bandwidth than CPUs. Nevertheless, the GPU architecture is more suited to processing dense, regular datasets. Sparse data problems such as sparse matrix-vector and matrix-matrix multiplication, breadth first search, shortest paths and other graph algorithms achieve much less speedup on GPUs than their dense data counterparts. The problems listed earlier of load imbalance, high memory latency and bandwidth saturation are compounded on the GPUs due to its massively multithreading and SIMD execution model. In this thesis, we study three fundamental sparse data problems, sparse matrixvector multiplication (SpMV), sparse matrix-matrix multiplication (SpGEMM) and graph coloring. Each problem is used as a primitive in a number of higher level applications, and thus accelerating these problems leads to a broad range of improvements for other problems. For these problems, we present GPU algorithms based on novel techniques and which offer best in class performance. Our algorithms are based on analyzing the key performance issues and bottlenecks for each problem, and use both heuristical and theoretically motivated techniques to overcome these limitations. While some of the techniques are problem specific, others can be generalized to deal with issues common to many GPU based sparse data computations. Our SpMV algorithm is based on compacting a sparse matrix to increase its density and regularity of data access, and also making use of the GPU’s fast shared memory to increase the efficiency of repeated SpMV computations. The algorithm reduces I/O for vector accesses by 37% on average, and improves performance up to 35% compared to the previously fastest GPU SpMV algorithm. Our SpGEMM algorithm efficiently enumerates all the work done during a computation to achieve perfect load balancing, and also uses a randomized algorithm to nearly optimally partition the matrix into pieces small enough to be operated on using the fast but limited amount of shared memory. It is up to 2.5× faster than the state of the art GPU SpGEMM algorithm on the most difficult, unstructured matrices. Finally, we present two coloring algorithms optimized respectively for coloring quality and speed. The first algorithm uses a simple counter mechanism to greatly improve overall work efficiency, while the second algorithm achieves both high parallelism and relatively high efficiency by randomly coloring the graph based on estimates of its chromatic number. Compared to existing GPU coloring algorithms, our first algorithm uses 1.1 − 4.3× fewer colors on average, while the second algorithm uses slightly more colors but runs 2.7 − 4.3× faster than other algorithms. The techniques we introduced are the basis for our ongoing work on GPU sparse matrix and graph algorithms, as we seek to bridge the gap between the performance of sparse and dense data algorithms on GPUs.
author2	Wen Yonggang
author_facet	Wen Yonggang Pham, Nguyen Quang Anh
format	Theses and Dissertations
author	Pham, Nguyen Quang Anh
author_sort	Pham, Nguyen Quang Anh
title	Optimized GPU algorithms for sparse data problems
title_short	Optimized GPU algorithms for sparse data problems
title_full	Optimized GPU algorithms for sparse data problems
title_fullStr	Optimized GPU algorithms for sparse data problems
title_full_unstemmed	Optimized GPU algorithms for sparse data problems
title_sort	optimized gpu algorithms for sparse data problems
publishDate	2017
url	http://hdl.handle.net/10356/72409
_version_	1759857707872944128
spelling	sg-ntu-dr.10356-724092023-03-04T00:52:56Z Optimized GPU algorithms for sparse data problems Pham, Nguyen Quang Anh Wen Yonggang School of Computer Science and Engineering Fan Rui DRNTU::Engineering::Computer science and engineering Many important problems in science and engineering today deal with sparse data. Examples of sparse data include sparse matrices, where the number of nonzero values is much smaller than the total number possible and where nonzeros are located in scattered instead of regular positions, and graphs in which the average degree is low and the edge set has an irregular structure. This type of data frequently arises from real-world sources and is used to model physical, biological or social phenomena. Many sparse datasets are large, and require parallel computing to process efficiently. But sparsity leads to a number of performance challenges. Irregularities in the size and distribution of data leads to load imbalance between threads and scattered memory accesses that strain memory systems optimized for block based accesses. Graphics processing units (GPUs) have been successfully used in recent years to solve many big data problems. GPUs can execute thousands of threads simultaneously, and have much higher throughput and memory bandwidth than CPUs. Nevertheless, the GPU architecture is more suited to processing dense, regular datasets. Sparse data problems such as sparse matrix-vector and matrix-matrix multiplication, breadth first search, shortest paths and other graph algorithms achieve much less speedup on GPUs than their dense data counterparts. The problems listed earlier of load imbalance, high memory latency and bandwidth saturation are compounded on the GPUs due to its massively multithreading and SIMD execution model. In this thesis, we study three fundamental sparse data problems, sparse matrixvector multiplication (SpMV), sparse matrix-matrix multiplication (SpGEMM) and graph coloring. Each problem is used as a primitive in a number of higher level applications, and thus accelerating these problems leads to a broad range of improvements for other problems. For these problems, we present GPU algorithms based on novel techniques and which offer best in class performance. Our algorithms are based on analyzing the key performance issues and bottlenecks for each problem, and use both heuristical and theoretically motivated techniques to overcome these limitations. While some of the techniques are problem specific, others can be generalized to deal with issues common to many GPU based sparse data computations. Our SpMV algorithm is based on compacting a sparse matrix to increase its density and regularity of data access, and also making use of the GPU’s fast shared memory to increase the efficiency of repeated SpMV computations. The algorithm reduces I/O for vector accesses by 37% on average, and improves performance up to 35% compared to the previously fastest GPU SpMV algorithm. Our SpGEMM algorithm efficiently enumerates all the work done during a computation to achieve perfect load balancing, and also uses a randomized algorithm to nearly optimally partition the matrix into pieces small enough to be operated on using the fast but limited amount of shared memory. It is up to 2.5× faster than the state of the art GPU SpGEMM algorithm on the most difficult, unstructured matrices. Finally, we present two coloring algorithms optimized respectively for coloring quality and speed. The first algorithm uses a simple counter mechanism to greatly improve overall work efficiency, while the second algorithm achieves both high parallelism and relatively high efficiency by randomly coloring the graph based on estimates of its chromatic number. Compared to existing GPU coloring algorithms, our first algorithm uses 1.1 − 4.3× fewer colors on average, while the second algorithm uses slightly more colors but runs 2.7 − 4.3× faster than other algorithms. The techniques we introduced are the basis for our ongoing work on GPU sparse matrix and graph algorithms, as we seek to bridge the gap between the performance of sparse and dense data algorithms on GPUs. Doctor of Philosophy (SCE) 2017-07-10T08:38:20Z 2017-07-10T08:38:20Z 2017 Thesis Pham, N. Q. A. (2017). Optimized GPU algorithms for sparse data problems. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/72409 10.32657/10356/72409 en 121 p. application/pdf

Optimized GPU algorithms for sparse data problems

Similar Items