Efficient and rich graph clustering

Graph clustering is an important task in data mining and pattern recognition. With the rapid development of modern technology nowadays, a lot of challenges have been raised on different aspects of graph clustering. In this thesis, we propose new algorithms to address the issues of efficiency and ric...

Full description

Saved in:
Bibliographic Details
Main Author: Xu, Zhiqiang
Other Authors: Ke Yi Ping, Kelly
Format: Theses and Dissertations
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/65745
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-65745
record_format dspace
spelling sg-ntu-dr.10356-657452023-03-04T00:35:10Z Efficient and rich graph clustering Xu, Zhiqiang Ke Yi Ping, Kelly School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Information systems::Models and principles Graph clustering is an important task in data mining and pattern recognition. With the rapid development of modern technology nowadays, a lot of challenges have been raised on different aspects of graph clustering. In this thesis, we propose new algorithms to address the issues of efficiency and richness in clustering with graphs. Due to the fast growth of graph size at the age of information, the efficiency issue of graph clustering algorithms is a critical concern in their real applications. By virtue of technology advancements, meanwhile, the graph data we can acquire gets increasingly rich due to the availability of abundant additional information. This gives rise to two types of rich graphs of our interest, namely, attributed graphs and multiple graphs. These new characteristics of graph data pose great challenges for existing algorithms, making them less efficient and/or less effective. Therefore, it is demanding to develop new algorithms to improve the situation. Specifically, we study three problems on efficient and rich graph clustering as follows. Most of existing algorithms for attributed graph clustering are distance-based. Distance-based methods often require a lossy data fusion process for constructing a unified measure to cover both structural and attribute information, which is prone to suffering from information loss. Alternatively, we propose a model-based approach, which seamlessly casts the two types of information into a joint probabilistic framework. The framework is not only effective and flexible for various kinds of attributed graphs, but also efficient by performing variational inference. Stochastic blockmodel (SBM) is a popular probabilistic tool for graph modeling and also a building block for advanced graph models. However, its applicability to graph clustering has gradually become limited because the existing inference methods are either computationally expensive or not guaranteed to converge. Naturally, the algorithm for attributed graph clustering built upon SBM suffers from this weakness as well. To tackle these issues, we propose to exploit the geometric structure of the underlying Riemannian space with the inference problem in deploying the nonlinear conjugate gradient method. The resultant algorithm is not only efficient and convergent, but also robust, for the community detection on graphs with/without node attributes. There also exists a lot of work on clustering with multiple graphs. They achieve the same goal of finding a consensus clustering across different graphs in different ways. Whatever the ways they choose, however, the consensus information is often coupled with domain-specific information implicitly in the modeling. With domain-specific fraction unfiltered, the quality of consensus clustering can be degraded. To reduce the coupling of domain-specific information, we extend the idea of consensus and domain-specific subspace decomposition from flat data to graph data for explicitly modeling and distinguishing the two types of information, consensus and domain-specific, in multiple graphs. And the model is solved by a novel spectral clustering algorithm with better accuracy and efficiency. Doctor of Philosophy (SCE) 2015-12-11T02:59:16Z 2015-12-11T02:59:16Z 2015 2015 Thesis Xu, Z. (2015). Efficient and rich graph clustering. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/65745 en 218 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Models and principles
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Models and principles
Xu, Zhiqiang
Efficient and rich graph clustering
description Graph clustering is an important task in data mining and pattern recognition. With the rapid development of modern technology nowadays, a lot of challenges have been raised on different aspects of graph clustering. In this thesis, we propose new algorithms to address the issues of efficiency and richness in clustering with graphs. Due to the fast growth of graph size at the age of information, the efficiency issue of graph clustering algorithms is a critical concern in their real applications. By virtue of technology advancements, meanwhile, the graph data we can acquire gets increasingly rich due to the availability of abundant additional information. This gives rise to two types of rich graphs of our interest, namely, attributed graphs and multiple graphs. These new characteristics of graph data pose great challenges for existing algorithms, making them less efficient and/or less effective. Therefore, it is demanding to develop new algorithms to improve the situation. Specifically, we study three problems on efficient and rich graph clustering as follows. Most of existing algorithms for attributed graph clustering are distance-based. Distance-based methods often require a lossy data fusion process for constructing a unified measure to cover both structural and attribute information, which is prone to suffering from information loss. Alternatively, we propose a model-based approach, which seamlessly casts the two types of information into a joint probabilistic framework. The framework is not only effective and flexible for various kinds of attributed graphs, but also efficient by performing variational inference. Stochastic blockmodel (SBM) is a popular probabilistic tool for graph modeling and also a building block for advanced graph models. However, its applicability to graph clustering has gradually become limited because the existing inference methods are either computationally expensive or not guaranteed to converge. Naturally, the algorithm for attributed graph clustering built upon SBM suffers from this weakness as well. To tackle these issues, we propose to exploit the geometric structure of the underlying Riemannian space with the inference problem in deploying the nonlinear conjugate gradient method. The resultant algorithm is not only efficient and convergent, but also robust, for the community detection on graphs with/without node attributes. There also exists a lot of work on clustering with multiple graphs. They achieve the same goal of finding a consensus clustering across different graphs in different ways. Whatever the ways they choose, however, the consensus information is often coupled with domain-specific information implicitly in the modeling. With domain-specific fraction unfiltered, the quality of consensus clustering can be degraded. To reduce the coupling of domain-specific information, we extend the idea of consensus and domain-specific subspace decomposition from flat data to graph data for explicitly modeling and distinguishing the two types of information, consensus and domain-specific, in multiple graphs. And the model is solved by a novel spectral clustering algorithm with better accuracy and efficiency.
author2 Ke Yi Ping, Kelly
author_facet Ke Yi Ping, Kelly
Xu, Zhiqiang
format Theses and Dissertations
author Xu, Zhiqiang
author_sort Xu, Zhiqiang
title Efficient and rich graph clustering
title_short Efficient and rich graph clustering
title_full Efficient and rich graph clustering
title_fullStr Efficient and rich graph clustering
title_full_unstemmed Efficient and rich graph clustering
title_sort efficient and rich graph clustering
publishDate 2015
url http://hdl.handle.net/10356/65745
_version_ 1759855880429371392