Efficient algorithms for subgraph counting and enumeration on large graphs

This thesis delves into critical challenges associated with subgraph analysis in large-scale graphs. The proliferation of data in domains such as social networks, Internet applications, and biological networks necessitates a deep understanding of relationships modeled as graphs. The complexity and s...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Kaixin
Other Authors: Long Cheng
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/176240
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-176240
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Mathematics of computing
Graph algorithms
spellingShingle Computer and Information Science
Mathematics of computing
Graph algorithms
Wang, Kaixin
Efficient algorithms for subgraph counting and enumeration on large graphs
description This thesis delves into critical challenges associated with subgraph analysis in large-scale graphs. The proliferation of data in domains such as social networks, Internet applications, and biological networks necessitates a deep understanding of relationships modeled as graphs. The complexity and scale of these graphs pose challenges for efficient extraction of meaningful information. The thesis focuses on three interconnected research problems: subgraph counting in fully-dynamic graph streams, k-clique listing in large general graphs, and maximal clique enumeration in large general graphs. The first research problem addresses subgraph counting in fully-dynamic graph streams, where graphs evolve dynamically with edge insertions and deletions. Traditional ap- proaches sample edges uniformly, overlooking the varying importance of edges. The thesis introduces a novel weighted sampling framework, WSD, for fully-dynamic graph streams, which samples the edges based on their weights that indicate their importance and reflect their properties. It determines the weights of edges in a data-driven fashion, using a novel method based on reinforcement learning. We conduct extensive experi- ments to verify that our technique can produce estimates with smaller errors while often running faster compared with existing algorithms. The second research problem focuses on k-clique listing in large general graphs, a critical task for applications such as community detection and network analysis. The state-of-the-art algorithms all adopt a branch-and-bound (BB) framework with a vertex-oriented branching strategy (called VBBkC), which forms a sub-branch by expanding a partial k-clique with a vertex. These algorithms have the time complexity of O(km(\delta/2)^(k-2)), where m is the number of edges in the graph and \delta is the degeneracy of the graph. In this thesis, we propose a BB framework with a new edge-oriented branching (called EBBkC), which forms a sub-branch by expanding a partial k-clique with two vertices that connect each other (which correspond to an edge). We explore various edge orderings for EBBkC such that it achieves a time complexity of O(m\delta + km(\tau/2)^(k-2), where \tau is an integer related to the maximum truss number of the graph and we have \tau<\delta. The time complexity of EBBkC is better than that of VBBkC algorithms for k>3 since both O(m\delta) and O(km(\tau/2)^(k-2) are bounded by O(km(\delta/2)^(k-2)). Furthermore, we develop specialized algorithms for sub-branches on dense graphs so that we can early-terminate them and apply the specialized algorithms. We conduct extensive experiments on 19 real graphs, and the results show that our newly developed EBBkC based algorithms with the early termination technique consistently and largely outperform the state-of-the- art (VBBkC based) algorithms. The third research problem centers on maximal clique enumeration (MCE) in large general graphs, crucial for tasks like community detection and biological network analysis. Existing algorithms typically adopt the branch-and-bound framework with the vertex-oriented Bron-Kerbosch (BK) branching strategy, which forms the sub-branches by expanding the partial clique with a vertex. The state-of-the-arts have a time complexity no better than O(n\delta 3^(\delta/3)), where n is the number of vertices in the graph and \delta is the degeneracy of the graph. We introduce a novel approach, HBBMC, a hybrid framework combining vertex-oriented and edge-oriented BK branching, where the latter adopts a new branch-and-bound framework which forms the sub-branches by expanding the partial clique with a edge. This hybrid strategy enables more efficient pruning of sub-branches, resulting in a worst-case time complexity of O(m\delta + m\tau 3^(\tau/3)), where \tau is related to the maximum truss number. This theoretical result is better than that of the state-of-the-arts when \delta > \tau + (3/ln3) \rho, a condition satisfied by many real-world graphs, where \rho is the edge density of the graph defined as \rho = m/n. To further enhance efficiency, we introduce an early termination technique applicable to all branch-and-bound frameworks, which leverages the topological information of the graphs within the branches. This technique accelerates the enumeration process without affecting worst-case time complexity. Extensive experiments on real and synthetic graphs demonstrate the superior performance of our approach. The proposed HBBMC algorithm with the early termination technique consistently outperforms existing vertex-oriented algorithms. In summary, this thesis makes significant contributions by introducing novel algo- rithms, frameworks, and methodologies to address subgraph-related problems. The pro- posed approaches show superior performance across various real-world and synthetic datasets, providing efficient solutions for graph analysis in large-scale, dynamic, and complex scenarios.
author2 Long Cheng
author_facet Long Cheng
Wang, Kaixin
format Thesis-Doctor of Philosophy
author Wang, Kaixin
author_sort Wang, Kaixin
title Efficient algorithms for subgraph counting and enumeration on large graphs
title_short Efficient algorithms for subgraph counting and enumeration on large graphs
title_full Efficient algorithms for subgraph counting and enumeration on large graphs
title_fullStr Efficient algorithms for subgraph counting and enumeration on large graphs
title_full_unstemmed Efficient algorithms for subgraph counting and enumeration on large graphs
title_sort efficient algorithms for subgraph counting and enumeration on large graphs
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/176240
_version_ 1814047436037947392
spelling sg-ntu-dr.10356-1762402024-06-03T06:51:19Z Efficient algorithms for subgraph counting and enumeration on large graphs Wang, Kaixin Long Cheng School of Computer Science and Engineering c.long@ntu.edu.sg Computer and Information Science Mathematics of computing Graph algorithms This thesis delves into critical challenges associated with subgraph analysis in large-scale graphs. The proliferation of data in domains such as social networks, Internet applications, and biological networks necessitates a deep understanding of relationships modeled as graphs. The complexity and scale of these graphs pose challenges for efficient extraction of meaningful information. The thesis focuses on three interconnected research problems: subgraph counting in fully-dynamic graph streams, k-clique listing in large general graphs, and maximal clique enumeration in large general graphs. The first research problem addresses subgraph counting in fully-dynamic graph streams, where graphs evolve dynamically with edge insertions and deletions. Traditional ap- proaches sample edges uniformly, overlooking the varying importance of edges. The thesis introduces a novel weighted sampling framework, WSD, for fully-dynamic graph streams, which samples the edges based on their weights that indicate their importance and reflect their properties. It determines the weights of edges in a data-driven fashion, using a novel method based on reinforcement learning. We conduct extensive experi- ments to verify that our technique can produce estimates with smaller errors while often running faster compared with existing algorithms. The second research problem focuses on k-clique listing in large general graphs, a critical task for applications such as community detection and network analysis. The state-of-the-art algorithms all adopt a branch-and-bound (BB) framework with a vertex-oriented branching strategy (called VBBkC), which forms a sub-branch by expanding a partial k-clique with a vertex. These algorithms have the time complexity of O(km(\delta/2)^(k-2)), where m is the number of edges in the graph and \delta is the degeneracy of the graph. In this thesis, we propose a BB framework with a new edge-oriented branching (called EBBkC), which forms a sub-branch by expanding a partial k-clique with two vertices that connect each other (which correspond to an edge). We explore various edge orderings for EBBkC such that it achieves a time complexity of O(m\delta + km(\tau/2)^(k-2), where \tau is an integer related to the maximum truss number of the graph and we have \tau<\delta. The time complexity of EBBkC is better than that of VBBkC algorithms for k>3 since both O(m\delta) and O(km(\tau/2)^(k-2) are bounded by O(km(\delta/2)^(k-2)). Furthermore, we develop specialized algorithms for sub-branches on dense graphs so that we can early-terminate them and apply the specialized algorithms. We conduct extensive experiments on 19 real graphs, and the results show that our newly developed EBBkC based algorithms with the early termination technique consistently and largely outperform the state-of-the- art (VBBkC based) algorithms. The third research problem centers on maximal clique enumeration (MCE) in large general graphs, crucial for tasks like community detection and biological network analysis. Existing algorithms typically adopt the branch-and-bound framework with the vertex-oriented Bron-Kerbosch (BK) branching strategy, which forms the sub-branches by expanding the partial clique with a vertex. The state-of-the-arts have a time complexity no better than O(n\delta 3^(\delta/3)), where n is the number of vertices in the graph and \delta is the degeneracy of the graph. We introduce a novel approach, HBBMC, a hybrid framework combining vertex-oriented and edge-oriented BK branching, where the latter adopts a new branch-and-bound framework which forms the sub-branches by expanding the partial clique with a edge. This hybrid strategy enables more efficient pruning of sub-branches, resulting in a worst-case time complexity of O(m\delta + m\tau 3^(\tau/3)), where \tau is related to the maximum truss number. This theoretical result is better than that of the state-of-the-arts when \delta > \tau + (3/ln3) \rho, a condition satisfied by many real-world graphs, where \rho is the edge density of the graph defined as \rho = m/n. To further enhance efficiency, we introduce an early termination technique applicable to all branch-and-bound frameworks, which leverages the topological information of the graphs within the branches. This technique accelerates the enumeration process without affecting worst-case time complexity. Extensive experiments on real and synthetic graphs demonstrate the superior performance of our approach. The proposed HBBMC algorithm with the early termination technique consistently outperforms existing vertex-oriented algorithms. In summary, this thesis makes significant contributions by introducing novel algo- rithms, frameworks, and methodologies to address subgraph-related problems. The pro- posed approaches show superior performance across various real-world and synthetic datasets, providing efficient solutions for graph analysis in large-scale, dynamic, and complex scenarios. Doctor of Philosophy 2024-05-15T00:35:11Z 2024-05-15T00:35:11Z 2024 Thesis-Doctor of Philosophy Wang, K. (2024). Efficient algorithms for subgraph counting and enumeration on large graphs. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/176240 https://hdl.handle.net/10356/176240 10.32657/10356/176240 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University