Community discovery over attributed graphs

Community discovery, as a fundamental problem in graph mining, finds applications in various domains such as biological analysis, system optimization and fraud detection. Although many efforts have been made to address community discovery based on graph topology, few works have been devoted to commu...

Full description

Saved in:

Bibliographic Details
Main Author:	NIU, Yudong
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	community discovery attributed graph local clustering characteristic community relational graph Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/etd_coll/616 https://ink.library.smu.edu.sg/context/etd_coll/article/1614/viewcontent/GPIS_AY2019_PhD_Niu_Yudong.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.etd_coll-1614
record_format	dspace
spelling	sg-smu-ink.etd_coll-16142024-09-03T08:07:00Z Community discovery over attributed graphs NIU, Yudong Community discovery, as a fundamental problem in graph mining, finds applications in various domains such as biological analysis, system optimization and fraud detection. Although many efforts have been made to address community discovery based on graph topology, few works have been devoted to community discovery over attributed graphs, where graphs are equipped with attribute information such as node and edge types. Thus, this thesis is devoted to designing innovative solutions that can utilize the attribute information together with graph topology for community discovery. In particular, we study novel problems with efficient algorithms for both homogeneous and heterogeneous attributed graphs and discover communities that satisfy requirements from various downstream applications. First, we examine the community search problem over homogeneous attributed graphs where each node is attached with node labels describing its property. Several methods based on topology-driven community models have been proposed for community search over homogeneous attributed graphs. However, these methods require memory-intensive indexes to search for communities that comply with certain community models efficiently. To resolve this issue, we propose the index-free approach based on sweep over personalized pagerank vector computed on a weighted graph generated by reweighting each edge according to the number of query-aware motifs containing the edge. Results show that our proposal achieves upto 90% relative improvement on F1-scores while consuming 10x fewer memory than topology-driven baselines. Second, we investigate the problem of characteristic community discovery over homogeneous attributed graphs. We formulate the problem as finding the largest community, in which the query node is top-k influential, among a set of hierarchical communities generated by fusing the query attributes and graph topology. We propose the local reclustering technique to obtain attribute-related hierarchical communities efficiently. We propose the compressed influence evaluation technique to evaluate the influence. Subsequently, we design the HIMOR-index to accelerate the characteristic community queries by reusing influence values in communities that are not changed by query attributes. In experiments, our proposed method discovers larger and better characteristic communities than topology-driven baselines while achieving upto 25x speedups. Finally, we study community discovery over heterogeneous graphs and formulate the densest subgraph discovery problem over relational graphs induced by meta-paths. Existing works on community discovery assume that the data graph is materialized and available for access with low latency. However, the relational graphs need to be meticulously extracted from the heterogeneous graphs through meta-paths. To avoid this problem, we propose the sketch-based peeling algorithm, which utilizes sketches to estimate the subgraph densities to avoid materialization of the relational graphs for the densest subgraph discovery. Subsequently, we further devise a novel system through which users can implement the peeling algorithm for the densest subgraph discovery based on different density objectives. A case study on real-world heterogeneous data from Grab demonstrates that our sketch-based methods achieve 97.55% precision for fraud detection. 2024-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/616 https://ink.library.smu.edu.sg/context/etd_coll/article/1614/viewcontent/GPIS_AY2019_PhD_Niu_Yudong.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University community discovery attributed graph local clustering characteristic community relational graph Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	community discovery attributed graph local clustering characteristic community relational graph Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
spellingShingle	community discovery attributed graph local clustering characteristic community relational graph Artificial Intelligence and Robotics Graphics and Human Computer Interfaces NIU, Yudong Community discovery over attributed graphs
description	Community discovery, as a fundamental problem in graph mining, finds applications in various domains such as biological analysis, system optimization and fraud detection. Although many efforts have been made to address community discovery based on graph topology, few works have been devoted to community discovery over attributed graphs, where graphs are equipped with attribute information such as node and edge types. Thus, this thesis is devoted to designing innovative solutions that can utilize the attribute information together with graph topology for community discovery. In particular, we study novel problems with efficient algorithms for both homogeneous and heterogeneous attributed graphs and discover communities that satisfy requirements from various downstream applications. First, we examine the community search problem over homogeneous attributed graphs where each node is attached with node labels describing its property. Several methods based on topology-driven community models have been proposed for community search over homogeneous attributed graphs. However, these methods require memory-intensive indexes to search for communities that comply with certain community models efficiently. To resolve this issue, we propose the index-free approach based on sweep over personalized pagerank vector computed on a weighted graph generated by reweighting each edge according to the number of query-aware motifs containing the edge. Results show that our proposal achieves upto 90% relative improvement on F1-scores while consuming 10x fewer memory than topology-driven baselines. Second, we investigate the problem of characteristic community discovery over homogeneous attributed graphs. We formulate the problem as finding the largest community, in which the query node is top-k influential, among a set of hierarchical communities generated by fusing the query attributes and graph topology. We propose the local reclustering technique to obtain attribute-related hierarchical communities efficiently. We propose the compressed influence evaluation technique to evaluate the influence. Subsequently, we design the HIMOR-index to accelerate the characteristic community queries by reusing influence values in communities that are not changed by query attributes. In experiments, our proposed method discovers larger and better characteristic communities than topology-driven baselines while achieving upto 25x speedups. Finally, we study community discovery over heterogeneous graphs and formulate the densest subgraph discovery problem over relational graphs induced by meta-paths. Existing works on community discovery assume that the data graph is materialized and available for access with low latency. However, the relational graphs need to be meticulously extracted from the heterogeneous graphs through meta-paths. To avoid this problem, we propose the sketch-based peeling algorithm, which utilizes sketches to estimate the subgraph densities to avoid materialization of the relational graphs for the densest subgraph discovery. Subsequently, we further devise a novel system through which users can implement the peeling algorithm for the densest subgraph discovery based on different density objectives. A case study on real-world heterogeneous data from Grab demonstrates that our sketch-based methods achieve 97.55% precision for fraud detection.
format	text
author	NIU, Yudong
author_facet	NIU, Yudong
author_sort	NIU, Yudong
title	Community discovery over attributed graphs
title_short	Community discovery over attributed graphs
title_full	Community discovery over attributed graphs
title_fullStr	Community discovery over attributed graphs
title_full_unstemmed	Community discovery over attributed graphs
title_sort	community discovery over attributed graphs
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/etd_coll/616 https://ink.library.smu.edu.sg/context/etd_coll/article/1614/viewcontent/GPIS_AY2019_PhD_Niu_Yudong.pdf
_version_	1814047830562570240

Community discovery over attributed graphs

Similar Items