Enhanced subspace clustering

Subspace clustering overcomes the curse of dimensionality that traditional clustering suffered, by finding groups of objects that are homogeneous in subspaces of the data, instead of the full space. Research on basic subspace clustering over the past decade primary focuses on finding groups of objec...

Full description

Saved in:

Bibliographic Details
Main Author:	Sim, Kelvin Sian Hui.
Other Authors:	Vivekanand Gopalkrishnan
Format:	Theses and Dissertations
Language:	English
Published:	2012
Subjects:	DRNTU::Engineering::Computer science and engineering::Computing methodologies
Online Access:	http://hdl.handle.net/10356/48647
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-48647
record_format	dspace
spelling	sg-ntu-dr.10356-486472023-03-04T00:35:13Z Enhanced subspace clustering Sim, Kelvin Sian Hui. Vivekanand Gopalkrishnan School of Computer Engineering Centre for Advanced Information Systems Cong Gao DRNTU::Engineering::Computer science and engineering::Computing methodologies Subspace clustering overcomes the curse of dimensionality that traditional clustering suffered, by finding groups of objects that are homogeneous in subspaces of the data, instead of the full space. Research on basic subspace clustering over the past decade primary focuses on finding groups of objects that are closed together in subspaces of 2D data. Recently, the proliferation of non-traditional data and the need for higher quality clustering results have shifted the research paradigm to enhanced subspace clustering, which focuses on problems that cannot be handled or solved effectively through basic subspace clustering. The problems of enhanced subspace clustering can be categorized into two main groups, handling non-traditional data and improving clustering results. We give a survey on the enhanced subspace clustering problems, desired properties that these problems sought in their solutions, and the existing solutions. We study three main problems of enhanced subspace clustering on 2D and 3D datasets: mining subspace clusters in noisy data, mining significant subspace clusters and mining semi-supervised subspace clusters. For mining subspace clusters in noisy data, we found several problems of existing approaches, such as mining incomplete and unstable results, lacking the ability to handle 3D data, and mining clusters that are non-maximal and that contain skewed noise. We propose subspace clusters that are maximal and do not contain skewed noise. We also develop algorithms which exploit the anti-monotone property of the clusters to efficiently mine the complete and stable set of results. We show the effectiveness of our solution in mining biologically significant protein clusters in protein-protein interaction data, which is notoriously noisy in nature. For mining significant subspace clusters, we formulate an information theory concept known as correlation information, to measure the significance of the subspace clusters. We propose mining subspace clusters with high correlation information, and we develop an algorithm which uses the concept of rarity to mine significant 3D subspace clusters in a parameter-insensitive way. We show the effectiveness of our solution in finding significant (1) groups of proteins in protein-protein interaction data, (2) clusters of words and documents in word-document data and (3) in classifying an insurance data, where significant clusters are used as rules of the classifier. For mining semi-supervised subspace clusters, we propose actionable subspace clusters, which are semi-supervised subspace clusters that allow incorporation of user's knowledge, and can suggest beneficial actions to the users. We develop algorithms that use augmented Lagrangian multiplier method coupled with frequent itemset mining algorithm to efficiently mine the actionable clusters in a parameter-insensitive way. We show the effectiveness of our solution in finding actionable groups of residues in protein structural data, which are potential binding sites for drug molecules. Lastly, we present a financial data mining application on value investing, and show that our proposed algorithms outperform a famous value investment strategy in 70% of the experiments. Doctor of Philosophy 2012-05-04T08:17:17Z 2012-05-04T08:17:17Z 2012 2012 Thesis http://hdl.handle.net/10356/48647 en 296 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computing methodologies
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computing methodologies Sim, Kelvin Sian Hui. Enhanced subspace clustering
description	Subspace clustering overcomes the curse of dimensionality that traditional clustering suffered, by finding groups of objects that are homogeneous in subspaces of the data, instead of the full space. Research on basic subspace clustering over the past decade primary focuses on finding groups of objects that are closed together in subspaces of 2D data. Recently, the proliferation of non-traditional data and the need for higher quality clustering results have shifted the research paradigm to enhanced subspace clustering, which focuses on problems that cannot be handled or solved effectively through basic subspace clustering. The problems of enhanced subspace clustering can be categorized into two main groups, handling non-traditional data and improving clustering results. We give a survey on the enhanced subspace clustering problems, desired properties that these problems sought in their solutions, and the existing solutions. We study three main problems of enhanced subspace clustering on 2D and 3D datasets: mining subspace clusters in noisy data, mining significant subspace clusters and mining semi-supervised subspace clusters. For mining subspace clusters in noisy data, we found several problems of existing approaches, such as mining incomplete and unstable results, lacking the ability to handle 3D data, and mining clusters that are non-maximal and that contain skewed noise. We propose subspace clusters that are maximal and do not contain skewed noise. We also develop algorithms which exploit the anti-monotone property of the clusters to efficiently mine the complete and stable set of results. We show the effectiveness of our solution in mining biologically significant protein clusters in protein-protein interaction data, which is notoriously noisy in nature. For mining significant subspace clusters, we formulate an information theory concept known as correlation information, to measure the significance of the subspace clusters. We propose mining subspace clusters with high correlation information, and we develop an algorithm which uses the concept of rarity to mine significant 3D subspace clusters in a parameter-insensitive way. We show the effectiveness of our solution in finding significant (1) groups of proteins in protein-protein interaction data, (2) clusters of words and documents in word-document data and (3) in classifying an insurance data, where significant clusters are used as rules of the classifier. For mining semi-supervised subspace clusters, we propose actionable subspace clusters, which are semi-supervised subspace clusters that allow incorporation of user's knowledge, and can suggest beneficial actions to the users. We develop algorithms that use augmented Lagrangian multiplier method coupled with frequent itemset mining algorithm to efficiently mine the actionable clusters in a parameter-insensitive way. We show the effectiveness of our solution in finding actionable groups of residues in protein structural data, which are potential binding sites for drug molecules. Lastly, we present a financial data mining application on value investing, and show that our proposed algorithms outperform a famous value investment strategy in 70% of the experiments.
author2	Vivekanand Gopalkrishnan
author_facet	Vivekanand Gopalkrishnan Sim, Kelvin Sian Hui.
format	Theses and Dissertations
author	Sim, Kelvin Sian Hui.
author_sort	Sim, Kelvin Sian Hui.
title	Enhanced subspace clustering
title_short	Enhanced subspace clustering
title_full	Enhanced subspace clustering
title_fullStr	Enhanced subspace clustering
title_full_unstemmed	Enhanced subspace clustering
title_sort	enhanced subspace clustering
publishDate	2012
url	http://hdl.handle.net/10356/48647
_version_	1759856052978843648

Enhanced subspace clustering

Similar Items