Efficient temporal graph processing : shortest paths, closeness centralities and S-cliques
Temporal graph is a data structure to represent a dynamic graph, where vertices and edges can change over time. Countless real-world systems can be modelled as temporal graphs, such as transportation networks and social networks. It enriches classic graph research by adding a new time dimension, whi...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/152573 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Temporal graph is a data structure to represent a dynamic graph, where vertices and edges can change over time. Countless real-world systems can be modelled as temporal graphs, such as transportation networks and social networks. It enriches classic graph research by adding a new time dimension, which leads to many new problems that require more specialised and sophisticated algorithms to solve. In this thesis, three types of problems in temporal graphs are studied. The thesis starts from the problem of efficient computation of earliest-arrival paths, which is one type of shortest paths in temporal graphs, representing a one-to-one relation between two vertices in a temporal graph. Then, the evolving closeness centrality problem in temporal graphs is studied, which is a one-to-many relation of a vertex to a set of other vertices in a temporal graph. Finally, persistent s-clique enumeration problem in temporal graphs is studied, a many-to-many relation among a set of vertices in a temporal graph.
Answering earliest-arrival queries in temporal graphs is one of the most fundamental studies with numerous applications, such as information diffusion and measurement of temporal closeness centrality. As graph sizes are growing rapidly, speedup of query execution time becomes even more important. In Chapter 3, a novel edge-centric parallel algorithm is proposed for solving single-source earliest-arrival problems in temporal graphs based on a new data structure named EdgeScan-Dependency Graph. The proposed parallel algorithm is evaluated by theoretical analysis as well as by empirical experiments on real-world temporal graphs and synthetic graphs. Empirical results show that the new parallel algorithm outperforms the existing serial algorithm by one order of magnitude on multi-core processors for real-world data and synthetic data.
In Chapter 4, evolving closeness centrality in temporal graphs is studied, which is one of the key indicators for vertex importance in social network analytics. Since social networks are constantly growing, it is essential to monitor a vertex’s closeness centrality through time in order to study its trend in influential power. In this chapter, evolving closeness centrality problem is defined by computing a vertex’s closeness centrality for all graph snapshots in a temporal graph. Novel algorithms that efficiently utilise graph temporal information are proposed, and theoretical analysis on its time complexity is given, which is shown to be small for real-world social networks. Experiments on various real-world data sets show speedups of one to two orders of magnitude compared to the existing graph-topology-based algorithms. Lastly, synthetic data sets are created to show how topological and temporal graph features affect the closeness centrality computation time.
Enumerating maximal cliques is a common method for mining hidden communities in social network analysis. Recent studies focus on relaxed cliques due to the fact that communities in the real world are rarely ‘fully connected’ cliques. Among them, a reasonably relaxed version is maximal s-clique, which allows a vertex to be at distance at most s to all others. Since social networks are temporal graphs that constantly evolve, it is more beneficial to find persistent communities that span multiple time frames. Thus, the definition of s-cliques is extended to temporal graphs and a new persistent clique pattern named Persistent connected S-Cliques is defined and a novel algorithm to enumerate all maximal PSCs in temporal graphs is proposed. In Chapter 5, firstly maximal connected s-cliques enumeration in static graphs is revisited and a faster algorithm is proposed. The improvement over the state-of-the-art algorithms is twofold. First, an initial divide and conquer phase is introduced for more efficient vertex pruning. Second, the proposed algorithm has a smaller search space by visiting only connected s-cliques. Then, persistent connected s-cliques in temporal graphs are defined and a novel algorithm to enumerate all maximal candidates is proposed. Lastly, extensive experiments are conducted both on synthetic datasets and real-world datasets to show the efficiency of proposed algorithms.
To summarise, three types of classic graph problems in the setting of temporal graphs are studied. Firstly, their new definitions in temporal graphs, and real-world use cases and applications are presented. Then efficient algorithms that can make use of the temporal information in addition to graph topological structures are designed and developed. Finally, experiments are conducted on real-world dataset to show real use case applications of our algorithms, as well as on synthetic datasets to illustrate the scalability of the proposed algorithms and its coverage on different types of graphs. |
---|