Enhanced subspace clustering

Subspace clustering overcomes the curse of dimensionality that traditional clustering suffered, by finding groups of objects that are homogeneous in subspaces of the data, instead of the full space. Research on basic subspace clustering over the past decade primary focuses on finding groups of objec...

Full description

Saved in:
Bibliographic Details
Main Author: Sim, Kelvin Sian Hui.
Other Authors: Vivekanand Gopalkrishnan
Format: Theses and Dissertations
Language:English
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/10356/48647
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-48647
record_format dspace
spelling sg-ntu-dr.10356-486472023-03-04T00:35:13Z Enhanced subspace clustering Sim, Kelvin Sian Hui. Vivekanand Gopalkrishnan School of Computer Engineering Centre for Advanced Information Systems Cong Gao DRNTU::Engineering::Computer science and engineering::Computing methodologies Subspace clustering overcomes the curse of dimensionality that traditional clustering suffered, by finding groups of objects that are homogeneous in subspaces of the data, instead of the full space. Research on basic subspace clustering over the past decade primary focuses on finding groups of objects that are closed together in subspaces of 2D data. Recently, the proliferation of non-traditional data and the need for higher quality clustering results have shifted the research paradigm to enhanced subspace clustering, which focuses on problems that cannot be handled or solved effectively through basic subspace clustering. The problems of enhanced subspace clustering can be categorized into two main groups, handling non-traditional data and improving clustering results. We give a survey on the enhanced subspace clustering problems, desired properties that these problems sought in their solutions, and the existing solutions. We study three main problems of enhanced subspace clustering on 2D and 3D datasets: mining subspace clusters in noisy data, mining significant subspace clusters and mining semi-supervised subspace clusters. For mining subspace clusters in noisy data, we found several problems of existing approaches, such as mining incomplete and unstable results, lacking the ability to handle 3D data, and mining clusters that are non-maximal and that contain skewed noise. We propose subspace clusters that are maximal and do not contain skewed noise. We also develop algorithms which exploit the anti-monotone property of the clusters to efficiently mine the complete and stable set of results. We show the effectiveness of our solution in mining biologically significant protein clusters in protein-protein interaction data, which is notoriously noisy in nature. For mining significant subspace clusters, we formulate an information theory concept known as correlation information, to measure the significance of the subspace clusters. We propose mining subspace clusters with high correlation information, and we develop an algorithm which uses the concept of rarity to mine significant 3D subspace clusters in a parameter-insensitive way. We show the effectiveness of our solution in finding significant (1) groups of proteins in protein-protein interaction data, (2) clusters of words and documents in word-document data and (3) in classifying an insurance data, where significant clusters are used as rules of the classifier. For mining semi-supervised subspace clusters, we propose actionable subspace clusters, which are semi-supervised subspace clusters that allow incorporation of user's knowledge, and can suggest beneficial actions to the users. We develop algorithms that use augmented Lagrangian multiplier method coupled with frequent itemset mining algorithm to efficiently mine the actionable clusters in a parameter-insensitive way. We show the effectiveness of our solution in finding actionable groups of residues in protein structural data, which are potential binding sites for drug molecules. Lastly, we present a financial data mining application on value investing, and show that our proposed algorithms outperform a famous value investment strategy in 70% of the experiments. Doctor of Philosophy 2012-05-04T08:17:17Z 2012-05-04T08:17:17Z 2012 2012 Thesis http://hdl.handle.net/10356/48647 en 296 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies
Sim, Kelvin Sian Hui.
Enhanced subspace clustering
description Subspace clustering overcomes the curse of dimensionality that traditional clustering suffered, by finding groups of objects that are homogeneous in subspaces of the data, instead of the full space. Research on basic subspace clustering over the past decade primary focuses on finding groups of objects that are closed together in subspaces of 2D data. Recently, the proliferation of non-traditional data and the need for higher quality clustering results have shifted the research paradigm to enhanced subspace clustering, which focuses on problems that cannot be handled or solved effectively through basic subspace clustering. The problems of enhanced subspace clustering can be categorized into two main groups, handling non-traditional data and improving clustering results. We give a survey on the enhanced subspace clustering problems, desired properties that these problems sought in their solutions, and the existing solutions. We study three main problems of enhanced subspace clustering on 2D and 3D datasets: mining subspace clusters in noisy data, mining significant subspace clusters and mining semi-supervised subspace clusters. For mining subspace clusters in noisy data, we found several problems of existing approaches, such as mining incomplete and unstable results, lacking the ability to handle 3D data, and mining clusters that are non-maximal and that contain skewed noise. We propose subspace clusters that are maximal and do not contain skewed noise. We also develop algorithms which exploit the anti-monotone property of the clusters to efficiently mine the complete and stable set of results. We show the effectiveness of our solution in mining biologically significant protein clusters in protein-protein interaction data, which is notoriously noisy in nature. For mining significant subspace clusters, we formulate an information theory concept known as correlation information, to measure the significance of the subspace clusters. We propose mining subspace clusters with high correlation information, and we develop an algorithm which uses the concept of rarity to mine significant 3D subspace clusters in a parameter-insensitive way. We show the effectiveness of our solution in finding significant (1) groups of proteins in protein-protein interaction data, (2) clusters of words and documents in word-document data and (3) in classifying an insurance data, where significant clusters are used as rules of the classifier. For mining semi-supervised subspace clusters, we propose actionable subspace clusters, which are semi-supervised subspace clusters that allow incorporation of user's knowledge, and can suggest beneficial actions to the users. We develop algorithms that use augmented Lagrangian multiplier method coupled with frequent itemset mining algorithm to efficiently mine the actionable clusters in a parameter-insensitive way. We show the effectiveness of our solution in finding actionable groups of residues in protein structural data, which are potential binding sites for drug molecules. Lastly, we present a financial data mining application on value investing, and show that our proposed algorithms outperform a famous value investment strategy in 70% of the experiments.
author2 Vivekanand Gopalkrishnan
author_facet Vivekanand Gopalkrishnan
Sim, Kelvin Sian Hui.
format Theses and Dissertations
author Sim, Kelvin Sian Hui.
author_sort Sim, Kelvin Sian Hui.
title Enhanced subspace clustering
title_short Enhanced subspace clustering
title_full Enhanced subspace clustering
title_fullStr Enhanced subspace clustering
title_full_unstemmed Enhanced subspace clustering
title_sort enhanced subspace clustering
publishDate 2012
url http://hdl.handle.net/10356/48647
_version_ 1759856052978843648