Efficient query processing on large graphs

Graphs provide a natural way to represent real-world objects and their relationships, and hence they are utilized to model concepts and data in various domains, such as social networks, the Web, biological informatics and spatial databases. In the big data era, most graphs are extremely large and gr...

Full description

Saved in:
Bibliographic Details
Main Author: Zhu, Diwen
Other Authors: Xiao Xiaokui
Format: Theses and Dissertations
Language:English
Published: 2014
Subjects:
Online Access:https://hdl.handle.net/10356/61838
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-61838
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Database management
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Database management
Zhu, Diwen
Efficient query processing on large graphs
description Graphs provide a natural way to represent real-world objects and their relationships, and hence they are utilized to model concepts and data in various domains, such as social networks, the Web, biological informatics and spatial databases. In the big data era, most graphs are extremely large and growing rapidly in size, which brings challenges to processing graph queries e ciently. In this thesis, we investigate three important problems in graph query processing, namely, the point-to-point shortest path and distance queries on road networks, the single source shortest path and distance queries on diskresident graphs, and the point-to-point reachability queries on dynamic graphs. These problems are not only interesting in themselves when the graph size becomes large, but also fundamental building blocks of numerous real-world applications. In particular, given two locations s and t in a road network, a point-to-point distance query returns the minimum network distance from s to t, while a point-to-point shortest path query computes the actual route that achieves the minimum distance. To solve this problem, we propose Arterial Hierarchy (AH), an index structure that narrows the gap between theory and practice. On the theoretical side, we show that, based on a realistic characteristic of real-world road networks, AH answers any distance query in constant time. In addition, any shortest path query can be answered with k more steps, where k is the number of vertices on the shortest path. On the practical side, AH outperforms the state-of-the-art methods in terms of query time and its space and pre-computation overheads are moderate. Then, given a vertex s in a graph G, a single source shortest distance query from s asks for the distance from s to every other vertex in G, while a single source shortest path query retrieves the shortest path from s to any other vertex. To handle these two types of queries on disk-resident graphs, we propose Highways-on-disk (HoD), a disk-based index that supports both shortest path and distance queries on directed and weighted graphs. Di erent from existing methods, HoD does not assume the graph ts into memory, does not limit the graph type as undirected graph, and does not restrict that the edges have unit weights or even integer weights. Although being more general and exible, HoD is still able to outperform the state-of-the-art methods in terms of query processing time by up to two orders of magnitude. In addition, HoD has a smaller space consumption and pre-computation time in most cases. Finally, given two vertices s and t in G, a reachability query from s to t asks whether there exists a path from s to t in G. To solve the problem on dynamic graphs, we rsti propose the Total Order Labeling (TOL) framework, which summarizes three most advanced methods [22,51,94] for reachability queries on static graphs. Then we investigate novel algorithms to handle updates of TOL, regarding both insertions and deletions on dynamic graphs. To our best knowledge, this is the rst practically e cient and scalable approach that handles the point-to-point reachability queries on dynamic graphs. Finally we propose a new reachability index Butter y, which o ers reduced preprocessing, space, and query costs than any existing indices under TOL [22,51,94]. We experimentally evaluate TOL using a large variety of benchmark datasets with up to twenty million vertices, and we demonstrate the superiority of TOL against alternative solutions for static and dynamic graphs.
author2 Xiao Xiaokui
author_facet Xiao Xiaokui
Zhu, Diwen
format Theses and Dissertations
author Zhu, Diwen
author_sort Zhu, Diwen
title Efficient query processing on large graphs
title_short Efficient query processing on large graphs
title_full Efficient query processing on large graphs
title_fullStr Efficient query processing on large graphs
title_full_unstemmed Efficient query processing on large graphs
title_sort efficient query processing on large graphs
publishDate 2014
url https://hdl.handle.net/10356/61838
_version_ 1759856922483228672
spelling sg-ntu-dr.10356-618382023-03-04T00:46:09Z Efficient query processing on large graphs Zhu, Diwen Xiao Xiaokui School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Information systems::Database management Graphs provide a natural way to represent real-world objects and their relationships, and hence they are utilized to model concepts and data in various domains, such as social networks, the Web, biological informatics and spatial databases. In the big data era, most graphs are extremely large and growing rapidly in size, which brings challenges to processing graph queries e ciently. In this thesis, we investigate three important problems in graph query processing, namely, the point-to-point shortest path and distance queries on road networks, the single source shortest path and distance queries on diskresident graphs, and the point-to-point reachability queries on dynamic graphs. These problems are not only interesting in themselves when the graph size becomes large, but also fundamental building blocks of numerous real-world applications. In particular, given two locations s and t in a road network, a point-to-point distance query returns the minimum network distance from s to t, while a point-to-point shortest path query computes the actual route that achieves the minimum distance. To solve this problem, we propose Arterial Hierarchy (AH), an index structure that narrows the gap between theory and practice. On the theoretical side, we show that, based on a realistic characteristic of real-world road networks, AH answers any distance query in constant time. In addition, any shortest path query can be answered with k more steps, where k is the number of vertices on the shortest path. On the practical side, AH outperforms the state-of-the-art methods in terms of query time and its space and pre-computation overheads are moderate. Then, given a vertex s in a graph G, a single source shortest distance query from s asks for the distance from s to every other vertex in G, while a single source shortest path query retrieves the shortest path from s to any other vertex. To handle these two types of queries on disk-resident graphs, we propose Highways-on-disk (HoD), a disk-based index that supports both shortest path and distance queries on directed and weighted graphs. Di erent from existing methods, HoD does not assume the graph ts into memory, does not limit the graph type as undirected graph, and does not restrict that the edges have unit weights or even integer weights. Although being more general and exible, HoD is still able to outperform the state-of-the-art methods in terms of query processing time by up to two orders of magnitude. In addition, HoD has a smaller space consumption and pre-computation time in most cases. Finally, given two vertices s and t in G, a reachability query from s to t asks whether there exists a path from s to t in G. To solve the problem on dynamic graphs, we rsti propose the Total Order Labeling (TOL) framework, which summarizes three most advanced methods [22,51,94] for reachability queries on static graphs. Then we investigate novel algorithms to handle updates of TOL, regarding both insertions and deletions on dynamic graphs. To our best knowledge, this is the rst practically e cient and scalable approach that handles the point-to-point reachability queries on dynamic graphs. Finally we propose a new reachability index Butter y, which o ers reduced preprocessing, space, and query costs than any existing indices under TOL [22,51,94]. We experimentally evaluate TOL using a large variety of benchmark datasets with up to twenty million vertices, and we demonstrate the superiority of TOL against alternative solutions for static and dynamic graphs. DOCTOR OF PHILOSOPHY (SCE) 2014-11-06T02:05:19Z 2014-11-06T02:05:19Z 2014 2014 Thesis Zhu, D. (2014). Efficient query processing on large graphs. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/61838 10.32657/10356/61838 en 159 p. application/pdf