Efficient and rich graph clustering

Graph clustering is an important task in data mining and pattern recognition. With the rapid development of modern technology nowadays, a lot of challenges have been raised on different aspects of graph clustering. In this thesis, we propose new algorithms to address the issues of efficiency and ric...

Full description

Saved in:
Bibliographic Details
Main Author: Xu, Zhiqiang
Other Authors: Ke Yi Ping, Kelly
Format: Theses and Dissertations
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/65745
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Graph clustering is an important task in data mining and pattern recognition. With the rapid development of modern technology nowadays, a lot of challenges have been raised on different aspects of graph clustering. In this thesis, we propose new algorithms to address the issues of efficiency and richness in clustering with graphs. Due to the fast growth of graph size at the age of information, the efficiency issue of graph clustering algorithms is a critical concern in their real applications. By virtue of technology advancements, meanwhile, the graph data we can acquire gets increasingly rich due to the availability of abundant additional information. This gives rise to two types of rich graphs of our interest, namely, attributed graphs and multiple graphs. These new characteristics of graph data pose great challenges for existing algorithms, making them less efficient and/or less effective. Therefore, it is demanding to develop new algorithms to improve the situation. Specifically, we study three problems on efficient and rich graph clustering as follows. Most of existing algorithms for attributed graph clustering are distance-based. Distance-based methods often require a lossy data fusion process for constructing a unified measure to cover both structural and attribute information, which is prone to suffering from information loss. Alternatively, we propose a model-based approach, which seamlessly casts the two types of information into a joint probabilistic framework. The framework is not only effective and flexible for various kinds of attributed graphs, but also efficient by performing variational inference. Stochastic blockmodel (SBM) is a popular probabilistic tool for graph modeling and also a building block for advanced graph models. However, its applicability to graph clustering has gradually become limited because the existing inference methods are either computationally expensive or not guaranteed to converge. Naturally, the algorithm for attributed graph clustering built upon SBM suffers from this weakness as well. To tackle these issues, we propose to exploit the geometric structure of the underlying Riemannian space with the inference problem in deploying the nonlinear conjugate gradient method. The resultant algorithm is not only efficient and convergent, but also robust, for the community detection on graphs with/without node attributes. There also exists a lot of work on clustering with multiple graphs. They achieve the same goal of finding a consensus clustering across different graphs in different ways. Whatever the ways they choose, however, the consensus information is often coupled with domain-specific information implicitly in the modeling. With domain-specific fraction unfiltered, the quality of consensus clustering can be degraded. To reduce the coupling of domain-specific information, we extend the idea of consensus and domain-specific subspace decomposition from flat data to graph data for explicitly modeling and distinguishing the two types of information, consensus and domain-specific, in multiple graphs. And the model is solved by a novel spectral clustering algorithm with better accuracy and efficiency.