Efficient index structures for reachability and shortest path queries
Graphs are a fundamental data structure to represent objects and their relations in various domains, e.g., social science, citation analysis, web link analysis, and navigation systems. Reachability and shortest path queries are two types of primitive and well-studied graph queries. In this thesis, w...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/68898 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-68898 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Information systems::Database management |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Information systems::Database management Wang, Sibo Efficient index structures for reachability and shortest path queries |
description |
Graphs are a fundamental data structure to represent objects and their relations in various domains, e.g., social science, citation analysis, web link analysis, and navigation systems. Reachability and shortest path queries are two types of primitive and well-studied graph queries. In this thesis, we investigate on how to build efficient index structures to answer the reachability queries and two variants of the shortest path queries. The first variant of the shortest path query considers the shortest path in timetable graphs, where it requires the consideration of spatiotemporal constraints between different edges. The second variant aims to provide the shortest path under an additional cost constraint. These problems find many real world applications, and are challenging to achieve high efficiency on sizable graphs. First, we study the reachability queries on directed graphs. Given a directed graph $G$, a source $s$, and a target $t$, a reachability query asks whether there exists a path from $s$ to $t$ in $G$. The state-of-the-art solutions for answering reachability queries are a class of labelling schemes referred to as {\em Hierarchical Hub Labelings (HHL)}. One distinct characteristic of {\em HHL} is that any {\em HHL} index implicitly or explicitly defines a total order of vertices in the graph, and the performance of the index is solely decided by the ordering. Several heuristic approaches have been proposed for vertex ordering in {\em HHL}, but none of them provides any worst-case guarantee. We present a novel study on the vertex ordering in {\em HHL}, and devise a polynomial-time algorithm for vertex ordering that yields an {\em HHL} index whose size is at most $O(\sqrt{n})$ times the optimal one. Motivated by our theoretical study, we design a new heuristic for vertex ordering, and show that it leads to an {\em HHL} index with superior practical performance for reachability queries. Besides, identifying fast routes in public transportation networks is an important problem with applications in map services and navigation systems. In public transportation networks, any traversal between different network locations relies on transportation services (e.g., buses, subways, and ferries.) that run on fixed routes with pre-defined schedules. A public transportation network can often be modeled as a timetable graph where (i) each node represents a station; and (ii) each directed edge $\langle u, v\rangle $ is associated with a timetable that records the departure (resp. arrival) time of each vehicle at station $u$ (resp. $v$). There are several types of shortest path queries on timetable graphs. We study three types of shortest path queries for route planning that have been extensively studied on timetable graphs: {\em earliest arrival path (EAP)}, {\em latest departure path (LDP)}, and {\em shortest duration path (SDP)} queries. These three types of queries aim to find a path that arrives at a place as early as possible, departs as late as possible but arrives on time, and has the shortest travel time, respectively. We present {\em Timetable Labelling (TTL)}, an efficient indexing technique for shortest path queries on timetable graphs. The basic idea of {\em TTL} is to associate each node $u$ with a set of labels, each of which records the shortest travel time from $u$ to some other node $v$ given a certain departure time from $u$; such labels would then be used during query processing to improve efficiency. In addition, we propose query algorithms that enable {\em TTL} to support EAP, LDP, and SDP queries, and investigate how we reduce the space consumption of {\em TTL} with advanced preprocessing and label compression methods. By conducting an extensive set of experiments on real world datasets, we demonstrate that {\em TTL} significantly outperforms the states of the art in terms of query efficiency, while incurring moderate preprocessing and space overheads. Finally, {\em constrained shortest path (CSP)} is a variant of the shortest path problem with an additional total cost constraint. Specifically, in a CSP query, each edge in the road network is associated with both a length and a cost, e.g., travel distance and time. Given an origin $s$, a destination $t$, and a cost constraint $\theta$, the goal is to find the path from $s$ to $t$ that minimizes its total length, while satisfying that its total cost does not exceed $\theta$. Deriving the exact answer for a CSP query has been proven to be NP-hard, and the majority of previous work focuses on approximate solutions. We propose {\em COLA}, the first practical solution for approximate CSP processing on large road networks. {\em COLA} exploits the facts that a road network can be effectively partitioned, and that there exists a relatively small set of landmark vertices that commonly appear in CSP results. Meanwhile, {\em COLA} includes both an effective indexing scheme for partition boundary vertices, and an efficient on-the-fly algorithm called $\alpha$-Dijk for path computation within a partition. Extensive experiments demonstrate that on continent-sized networks with tens of millions of vertices, {\em COLA} answers an approximate CSP query in sub-second time, whereas existing methods take hours. Interestingly, even without an index, the $\alpha$-Dijk algorithm in {\em COLA} still outperforms previous solutions by more than an order of magnitude. |
author2 |
Xiao Xiaokui |
author_facet |
Xiao Xiaokui Wang, Sibo |
format |
Theses and Dissertations |
author |
Wang, Sibo |
author_sort |
Wang, Sibo |
title |
Efficient index structures for reachability and shortest path queries |
title_short |
Efficient index structures for reachability and shortest path queries |
title_full |
Efficient index structures for reachability and shortest path queries |
title_fullStr |
Efficient index structures for reachability and shortest path queries |
title_full_unstemmed |
Efficient index structures for reachability and shortest path queries |
title_sort |
efficient index structures for reachability and shortest path queries |
publishDate |
2016 |
url |
https://hdl.handle.net/10356/68898 |
_version_ |
1759855439853387776 |
spelling |
sg-ntu-dr.10356-688982023-03-04T00:38:45Z Efficient index structures for reachability and shortest path queries Wang, Sibo Xiao Xiaokui School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Information systems::Database management Graphs are a fundamental data structure to represent objects and their relations in various domains, e.g., social science, citation analysis, web link analysis, and navigation systems. Reachability and shortest path queries are two types of primitive and well-studied graph queries. In this thesis, we investigate on how to build efficient index structures to answer the reachability queries and two variants of the shortest path queries. The first variant of the shortest path query considers the shortest path in timetable graphs, where it requires the consideration of spatiotemporal constraints between different edges. The second variant aims to provide the shortest path under an additional cost constraint. These problems find many real world applications, and are challenging to achieve high efficiency on sizable graphs. First, we study the reachability queries on directed graphs. Given a directed graph $G$, a source $s$, and a target $t$, a reachability query asks whether there exists a path from $s$ to $t$ in $G$. The state-of-the-art solutions for answering reachability queries are a class of labelling schemes referred to as {\em Hierarchical Hub Labelings (HHL)}. One distinct characteristic of {\em HHL} is that any {\em HHL} index implicitly or explicitly defines a total order of vertices in the graph, and the performance of the index is solely decided by the ordering. Several heuristic approaches have been proposed for vertex ordering in {\em HHL}, but none of them provides any worst-case guarantee. We present a novel study on the vertex ordering in {\em HHL}, and devise a polynomial-time algorithm for vertex ordering that yields an {\em HHL} index whose size is at most $O(\sqrt{n})$ times the optimal one. Motivated by our theoretical study, we design a new heuristic for vertex ordering, and show that it leads to an {\em HHL} index with superior practical performance for reachability queries. Besides, identifying fast routes in public transportation networks is an important problem with applications in map services and navigation systems. In public transportation networks, any traversal between different network locations relies on transportation services (e.g., buses, subways, and ferries.) that run on fixed routes with pre-defined schedules. A public transportation network can often be modeled as a timetable graph where (i) each node represents a station; and (ii) each directed edge $\langle u, v\rangle $ is associated with a timetable that records the departure (resp. arrival) time of each vehicle at station $u$ (resp. $v$). There are several types of shortest path queries on timetable graphs. We study three types of shortest path queries for route planning that have been extensively studied on timetable graphs: {\em earliest arrival path (EAP)}, {\em latest departure path (LDP)}, and {\em shortest duration path (SDP)} queries. These three types of queries aim to find a path that arrives at a place as early as possible, departs as late as possible but arrives on time, and has the shortest travel time, respectively. We present {\em Timetable Labelling (TTL)}, an efficient indexing technique for shortest path queries on timetable graphs. The basic idea of {\em TTL} is to associate each node $u$ with a set of labels, each of which records the shortest travel time from $u$ to some other node $v$ given a certain departure time from $u$; such labels would then be used during query processing to improve efficiency. In addition, we propose query algorithms that enable {\em TTL} to support EAP, LDP, and SDP queries, and investigate how we reduce the space consumption of {\em TTL} with advanced preprocessing and label compression methods. By conducting an extensive set of experiments on real world datasets, we demonstrate that {\em TTL} significantly outperforms the states of the art in terms of query efficiency, while incurring moderate preprocessing and space overheads. Finally, {\em constrained shortest path (CSP)} is a variant of the shortest path problem with an additional total cost constraint. Specifically, in a CSP query, each edge in the road network is associated with both a length and a cost, e.g., travel distance and time. Given an origin $s$, a destination $t$, and a cost constraint $\theta$, the goal is to find the path from $s$ to $t$ that minimizes its total length, while satisfying that its total cost does not exceed $\theta$. Deriving the exact answer for a CSP query has been proven to be NP-hard, and the majority of previous work focuses on approximate solutions. We propose {\em COLA}, the first practical solution for approximate CSP processing on large road networks. {\em COLA} exploits the facts that a road network can be effectively partitioned, and that there exists a relatively small set of landmark vertices that commonly appear in CSP results. Meanwhile, {\em COLA} includes both an effective indexing scheme for partition boundary vertices, and an efficient on-the-fly algorithm called $\alpha$-Dijk for path computation within a partition. Extensive experiments demonstrate that on continent-sized networks with tens of millions of vertices, {\em COLA} answers an approximate CSP query in sub-second time, whereas existing methods take hours. Interestingly, even without an index, the $\alpha$-Dijk algorithm in {\em COLA} still outperforms previous solutions by more than an order of magnitude. DOCTOR OF PHILOSOPHY (SCE) 2016-07-18T04:52:21Z 2016-07-18T04:52:21Z 2016 Thesis Wang, S. (2016). Efficient index structures for reachability and shortest path queries. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/68898 10.32657/10356/68898 en 146 p. application/pdf |