Efficient algorithms for subgraph counting and enumeration on large graphs
This thesis delves into critical challenges associated with subgraph analysis in large-scale graphs. The proliferation of data in domains such as social networks, Internet applications, and biological networks necessitates a deep understanding of relationships modeled as graphs. The complexity and s...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/176240 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This thesis delves into critical challenges associated with subgraph analysis in large-scale graphs. The proliferation of data in domains such as social networks, Internet applications, and biological networks necessitates a deep understanding of relationships modeled as graphs. The complexity and scale of these graphs pose challenges for efficient extraction of meaningful information. The thesis focuses on three interconnected research problems: subgraph counting in fully-dynamic graph streams, k-clique listing in large general graphs, and maximal clique enumeration in large general graphs.
The first research problem addresses subgraph counting in fully-dynamic graph streams, where graphs evolve dynamically with edge insertions and deletions. Traditional ap- proaches sample edges uniformly, overlooking the varying importance of edges. The thesis introduces a novel weighted sampling framework, WSD, for fully-dynamic graph streams, which samples the edges based on their weights that indicate their importance and reflect their properties. It determines the weights of edges in a data-driven fashion, using a novel method based on reinforcement learning. We conduct extensive experi- ments to verify that our technique can produce estimates with smaller errors while often running faster compared with existing algorithms.
The second research problem focuses on k-clique listing in large general graphs, a critical task for applications such as community detection and network analysis. The state-of-the-art algorithms all adopt a branch-and-bound (BB) framework with a vertex-oriented branching strategy (called VBBkC), which forms a sub-branch by expanding a partial k-clique with a vertex. These algorithms have the time complexity of O(km(\delta/2)^(k-2)), where m is the number of edges in the graph and \delta is the degeneracy of the graph. In this thesis, we propose a BB framework with a new edge-oriented branching (called EBBkC), which forms a sub-branch by expanding a partial k-clique with two vertices that connect each other (which correspond to an edge). We explore various edge orderings for EBBkC such that it achieves a time complexity of O(m\delta + km(\tau/2)^(k-2), where \tau is an integer related to the maximum truss number of the graph and we have \tau<\delta. The time complexity of EBBkC is better than that of VBBkC algorithms for k>3 since both O(m\delta) and O(km(\tau/2)^(k-2) are bounded by O(km(\delta/2)^(k-2)). Furthermore, we develop specialized algorithms for sub-branches on dense graphs so that we can early-terminate them and apply the specialized algorithms. We conduct extensive experiments on 19 real graphs, and the results show that our newly developed EBBkC based algorithms with the early termination technique consistently and largely outperform the state-of-the- art (VBBkC based) algorithms.
The third research problem centers on maximal clique enumeration (MCE) in large general graphs, crucial for tasks like community detection and biological network analysis. Existing algorithms typically adopt the branch-and-bound framework with the vertex-oriented Bron-Kerbosch (BK) branching strategy, which forms the sub-branches by expanding the partial clique with a vertex. The state-of-the-arts have a time complexity no better than O(n\delta 3^(\delta/3)), where n is the number of vertices in the graph and \delta is the degeneracy of the graph. We introduce a novel approach, HBBMC, a hybrid framework combining vertex-oriented and edge-oriented BK branching, where the latter adopts a new branch-and-bound framework which forms the sub-branches by expanding the partial clique with a edge. This hybrid strategy enables more efficient pruning of sub-branches, resulting in a worst-case time complexity of O(m\delta + m\tau 3^(\tau/3)), where \tau is related to the maximum truss number. This theoretical result is better than that of the state-of-the-arts when \delta > \tau + (3/ln3) \rho, a condition satisfied by many real-world graphs, where \rho is the edge density of the graph defined as \rho = m/n. To further enhance efficiency, we introduce an early termination technique applicable to all branch-and-bound frameworks, which leverages the topological information of the graphs within the branches. This technique accelerates the enumeration process without affecting worst-case time complexity. Extensive experiments on real and synthetic graphs demonstrate the superior performance of our approach. The proposed HBBMC algorithm with the early termination technique consistently outperforms existing vertex-oriented algorithms.
In summary, this thesis makes significant contributions by introducing novel algo- rithms, frameworks, and methodologies to address subgraph-related problems. The pro- posed approaches show superior performance across various real-world and synthetic datasets, providing efficient solutions for graph analysis in large-scale, dynamic, and complex scenarios. |
---|