Community discovery over attributed graphs

Community discovery, as a fundamental problem in graph mining, finds applications in various domains such as biological analysis, system optimization and fraud detection. Although many efforts have been made to address community discovery based on graph topology, few works have been devoted to commu...

Full description

Saved in:
Bibliographic Details
Main Author: NIU, Yudong
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/616
https://ink.library.smu.edu.sg/context/etd_coll/article/1614/viewcontent/GPIS_AY2019_PhD_Niu_Yudong.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.etd_coll-1614
record_format dspace
spelling sg-smu-ink.etd_coll-16142024-09-03T08:07:00Z Community discovery over attributed graphs NIU, Yudong Community discovery, as a fundamental problem in graph mining, finds applications in various domains such as biological analysis, system optimization and fraud detection. Although many efforts have been made to address community discovery based on graph topology, few works have been devoted to community discovery over attributed graphs, where graphs are equipped with attribute information such as node and edge types. Thus, this thesis is devoted to designing innovative solutions that can utilize the attribute information together with graph topology for community discovery. In particular, we study novel problems with efficient algorithms for both homogeneous and heterogeneous attributed graphs and discover communities that satisfy requirements from various downstream applications. First, we examine the community search problem over homogeneous attributed graphs where each node is attached with node labels describing its property. Several methods based on topology-driven community models have been proposed for community search over homogeneous attributed graphs. However, these methods require memory-intensive indexes to search for communities that comply with certain community models efficiently. To resolve this issue, we propose the index-free approach based on sweep over personalized pagerank vector computed on a weighted graph generated by reweighting each edge according to the number of query-aware motifs containing the edge. Results show that our proposal achieves upto 90% relative improvement on F1-scores while consuming 10x fewer memory than topology-driven baselines. Second, we investigate the problem of characteristic community discovery over homogeneous attributed graphs. We formulate the problem as finding the largest community, in which the query node is top-k influential, among a set of hierarchical communities generated by fusing the query attributes and graph topology. We propose the local reclustering technique to obtain attribute-related hierarchical communities efficiently. We propose the compressed influence evaluation technique to evaluate the influence. Subsequently, we design the HIMOR-index to accelerate the characteristic community queries by reusing influence values in communities that are not changed by query attributes. In experiments, our proposed method discovers larger and better characteristic communities than topology-driven baselines while achieving upto 25x speedups. Finally, we study community discovery over heterogeneous graphs and formulate the densest subgraph discovery problem over relational graphs induced by meta-paths. Existing works on community discovery assume that the data graph is materialized and available for access with low latency. However, the relational graphs need to be meticulously extracted from the heterogeneous graphs through meta-paths. To avoid this problem, we propose the sketch-based peeling algorithm, which utilizes sketches to estimate the subgraph densities to avoid materialization of the relational graphs for the densest subgraph discovery. Subsequently, we further devise a novel system through which users can implement the peeling algorithm for the densest subgraph discovery based on different density objectives. A case study on real-world heterogeneous data from Grab demonstrates that our sketch-based methods achieve 97.55% precision for fraud detection. 2024-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/616 https://ink.library.smu.edu.sg/context/etd_coll/article/1614/viewcontent/GPIS_AY2019_PhD_Niu_Yudong.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University community discovery attributed graph local clustering characteristic community relational graph Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic community discovery
attributed graph
local clustering
characteristic community
relational graph
Artificial Intelligence and Robotics
Graphics and Human Computer Interfaces
spellingShingle community discovery
attributed graph
local clustering
characteristic community
relational graph
Artificial Intelligence and Robotics
Graphics and Human Computer Interfaces
NIU, Yudong
Community discovery over attributed graphs
description Community discovery, as a fundamental problem in graph mining, finds applications in various domains such as biological analysis, system optimization and fraud detection. Although many efforts have been made to address community discovery based on graph topology, few works have been devoted to community discovery over attributed graphs, where graphs are equipped with attribute information such as node and edge types. Thus, this thesis is devoted to designing innovative solutions that can utilize the attribute information together with graph topology for community discovery. In particular, we study novel problems with efficient algorithms for both homogeneous and heterogeneous attributed graphs and discover communities that satisfy requirements from various downstream applications. First, we examine the community search problem over homogeneous attributed graphs where each node is attached with node labels describing its property. Several methods based on topology-driven community models have been proposed for community search over homogeneous attributed graphs. However, these methods require memory-intensive indexes to search for communities that comply with certain community models efficiently. To resolve this issue, we propose the index-free approach based on sweep over personalized pagerank vector computed on a weighted graph generated by reweighting each edge according to the number of query-aware motifs containing the edge. Results show that our proposal achieves upto 90% relative improvement on F1-scores while consuming 10x fewer memory than topology-driven baselines. Second, we investigate the problem of characteristic community discovery over homogeneous attributed graphs. We formulate the problem as finding the largest community, in which the query node is top-k influential, among a set of hierarchical communities generated by fusing the query attributes and graph topology. We propose the local reclustering technique to obtain attribute-related hierarchical communities efficiently. We propose the compressed influence evaluation technique to evaluate the influence. Subsequently, we design the HIMOR-index to accelerate the characteristic community queries by reusing influence values in communities that are not changed by query attributes. In experiments, our proposed method discovers larger and better characteristic communities than topology-driven baselines while achieving upto 25x speedups. Finally, we study community discovery over heterogeneous graphs and formulate the densest subgraph discovery problem over relational graphs induced by meta-paths. Existing works on community discovery assume that the data graph is materialized and available for access with low latency. However, the relational graphs need to be meticulously extracted from the heterogeneous graphs through meta-paths. To avoid this problem, we propose the sketch-based peeling algorithm, which utilizes sketches to estimate the subgraph densities to avoid materialization of the relational graphs for the densest subgraph discovery. Subsequently, we further devise a novel system through which users can implement the peeling algorithm for the densest subgraph discovery based on different density objectives. A case study on real-world heterogeneous data from Grab demonstrates that our sketch-based methods achieve 97.55% precision for fraud detection.
format text
author NIU, Yudong
author_facet NIU, Yudong
author_sort NIU, Yudong
title Community discovery over attributed graphs
title_short Community discovery over attributed graphs
title_full Community discovery over attributed graphs
title_fullStr Community discovery over attributed graphs
title_full_unstemmed Community discovery over attributed graphs
title_sort community discovery over attributed graphs
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/etd_coll/616
https://ink.library.smu.edu.sg/context/etd_coll/article/1614/viewcontent/GPIS_AY2019_PhD_Niu_Yudong.pdf
_version_ 1814047830562570240