Efficient and rich graph clustering

Graph clustering is an important task in data mining and pattern recognition. With the rapid development of modern technology nowadays, a lot of challenges have been raised on different aspects of graph clustering. In this thesis, we propose new algorithms to address the issues of efficiency and ric...

Full description

Saved in:

Bibliographic Details
Main Author:	Xu, Zhiqiang
Other Authors:	Ke Yi Ping, Kelly
Format:	Theses and Dissertations
Language:	English
Published:	2015
Subjects:	DRNTU::Engineering::Computer science and engineering::Information systems::Models and principles
Online Access:	http://hdl.handle.net/10356/65745
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-65745
record_format	dspace
spelling	sg-ntu-dr.10356-657452023-03-04T00:35:10Z Efficient and rich graph clustering Xu, Zhiqiang Ke Yi Ping, Kelly School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Information systems::Models and principles Graph clustering is an important task in data mining and pattern recognition. With the rapid development of modern technology nowadays, a lot of challenges have been raised on different aspects of graph clustering. In this thesis, we propose new algorithms to address the issues of efficiency and richness in clustering with graphs. Due to the fast growth of graph size at the age of information, the efficiency issue of graph clustering algorithms is a critical concern in their real applications. By virtue of technology advancements, meanwhile, the graph data we can acquire gets increasingly rich due to the availability of abundant additional information. This gives rise to two types of rich graphs of our interest, namely, attributed graphs and multiple graphs. These new characteristics of graph data pose great challenges for existing algorithms, making them less efficient and/or less effective. Therefore, it is demanding to develop new algorithms to improve the situation. Specifically, we study three problems on efficient and rich graph clustering as follows. Most of existing algorithms for attributed graph clustering are distance-based. Distance-based methods often require a lossy data fusion process for constructing a unified measure to cover both structural and attribute information, which is prone to suffering from information loss. Alternatively, we propose a model-based approach, which seamlessly casts the two types of information into a joint probabilistic framework. The framework is not only effective and flexible for various kinds of attributed graphs, but also efficient by performing variational inference. Stochastic blockmodel (SBM) is a popular probabilistic tool for graph modeling and also a building block for advanced graph models. However, its applicability to graph clustering has gradually become limited because the existing inference methods are either computationally expensive or not guaranteed to converge. Naturally, the algorithm for attributed graph clustering built upon SBM suffers from this weakness as well. To tackle these issues, we propose to exploit the geometric structure of the underlying Riemannian space with the inference problem in deploying the nonlinear conjugate gradient method. The resultant algorithm is not only efficient and convergent, but also robust, for the community detection on graphs with/without node attributes. There also exists a lot of work on clustering with multiple graphs. They achieve the same goal of finding a consensus clustering across different graphs in different ways. Whatever the ways they choose, however, the consensus information is often coupled with domain-specific information implicitly in the modeling. With domain-specific fraction unfiltered, the quality of consensus clustering can be degraded. To reduce the coupling of domain-specific information, we extend the idea of consensus and domain-specific subspace decomposition from flat data to graph data for explicitly modeling and distinguishing the two types of information, consensus and domain-specific, in multiple graphs. And the model is solved by a novel spectral clustering algorithm with better accuracy and efficiency. Doctor of Philosophy (SCE) 2015-12-11T02:59:16Z 2015-12-11T02:59:16Z 2015 2015 Thesis Xu, Z. (2015). Efficient and rich graph clustering. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/65745 en 218 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Information systems::Models and principles
spellingShingle	DRNTU::Engineering::Computer science and engineering::Information systems::Models and principles Xu, Zhiqiang Efficient and rich graph clustering
description	Graph clustering is an important task in data mining and pattern recognition. With the rapid development of modern technology nowadays, a lot of challenges have been raised on different aspects of graph clustering. In this thesis, we propose new algorithms to address the issues of efficiency and richness in clustering with graphs. Due to the fast growth of graph size at the age of information, the efficiency issue of graph clustering algorithms is a critical concern in their real applications. By virtue of technology advancements, meanwhile, the graph data we can acquire gets increasingly rich due to the availability of abundant additional information. This gives rise to two types of rich graphs of our interest, namely, attributed graphs and multiple graphs. These new characteristics of graph data pose great challenges for existing algorithms, making them less efficient and/or less effective. Therefore, it is demanding to develop new algorithms to improve the situation. Specifically, we study three problems on efficient and rich graph clustering as follows. Most of existing algorithms for attributed graph clustering are distance-based. Distance-based methods often require a lossy data fusion process for constructing a unified measure to cover both structural and attribute information, which is prone to suffering from information loss. Alternatively, we propose a model-based approach, which seamlessly casts the two types of information into a joint probabilistic framework. The framework is not only effective and flexible for various kinds of attributed graphs, but also efficient by performing variational inference. Stochastic blockmodel (SBM) is a popular probabilistic tool for graph modeling and also a building block for advanced graph models. However, its applicability to graph clustering has gradually become limited because the existing inference methods are either computationally expensive or not guaranteed to converge. Naturally, the algorithm for attributed graph clustering built upon SBM suffers from this weakness as well. To tackle these issues, we propose to exploit the geometric structure of the underlying Riemannian space with the inference problem in deploying the nonlinear conjugate gradient method. The resultant algorithm is not only efficient and convergent, but also robust, for the community detection on graphs with/without node attributes. There also exists a lot of work on clustering with multiple graphs. They achieve the same goal of finding a consensus clustering across different graphs in different ways. Whatever the ways they choose, however, the consensus information is often coupled with domain-specific information implicitly in the modeling. With domain-specific fraction unfiltered, the quality of consensus clustering can be degraded. To reduce the coupling of domain-specific information, we extend the idea of consensus and domain-specific subspace decomposition from flat data to graph data for explicitly modeling and distinguishing the two types of information, consensus and domain-specific, in multiple graphs. And the model is solved by a novel spectral clustering algorithm with better accuracy and efficiency.
author2	Ke Yi Ping, Kelly
author_facet	Ke Yi Ping, Kelly Xu, Zhiqiang
format	Theses and Dissertations
author	Xu, Zhiqiang
author_sort	Xu, Zhiqiang
title	Efficient and rich graph clustering
title_short	Efficient and rich graph clustering
title_full	Efficient and rich graph clustering
title_fullStr	Efficient and rich graph clustering
title_full_unstemmed	Efficient and rich graph clustering
title_sort	efficient and rich graph clustering
publishDate	2015
url	http://hdl.handle.net/10356/65745
_version_	1759855880429371392

Efficient and rich graph clustering

Similar Items