Enhanced subspace clustering

Subspace clustering overcomes the curse of dimensionality that traditional clustering suffered, by finding groups of objects that are homogeneous in subspaces of the data, instead of the full space. Research on basic subspace clustering over the past decade primary focuses on finding groups of objec...

全面介紹

Saved in:

書目詳細資料
主要作者:	Sim, Kelvin Sian Hui.
其他作者:	Vivekanand Gopalkrishnan
格式:	Theses and Dissertations
語言:	English
出版:	2012
主題:	DRNTU::Engineering::Computer science and engineering::Computing methodologies
在線閱讀:	http://hdl.handle.net/10356/48647
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

實物特徵
總結:	Subspace clustering overcomes the curse of dimensionality that traditional clustering suffered, by finding groups of objects that are homogeneous in subspaces of the data, instead of the full space. Research on basic subspace clustering over the past decade primary focuses on finding groups of objects that are closed together in subspaces of 2D data. Recently, the proliferation of non-traditional data and the need for higher quality clustering results have shifted the research paradigm to enhanced subspace clustering, which focuses on problems that cannot be handled or solved effectively through basic subspace clustering. The problems of enhanced subspace clustering can be categorized into two main groups, handling non-traditional data and improving clustering results. We give a survey on the enhanced subspace clustering problems, desired properties that these problems sought in their solutions, and the existing solutions. We study three main problems of enhanced subspace clustering on 2D and 3D datasets: mining subspace clusters in noisy data, mining significant subspace clusters and mining semi-supervised subspace clusters. For mining subspace clusters in noisy data, we found several problems of existing approaches, such as mining incomplete and unstable results, lacking the ability to handle 3D data, and mining clusters that are non-maximal and that contain skewed noise. We propose subspace clusters that are maximal and do not contain skewed noise. We also develop algorithms which exploit the anti-monotone property of the clusters to efficiently mine the complete and stable set of results. We show the effectiveness of our solution in mining biologically significant protein clusters in protein-protein interaction data, which is notoriously noisy in nature. For mining significant subspace clusters, we formulate an information theory concept known as correlation information, to measure the significance of the subspace clusters. We propose mining subspace clusters with high correlation information, and we develop an algorithm which uses the concept of rarity to mine significant 3D subspace clusters in a parameter-insensitive way. We show the effectiveness of our solution in finding significant (1) groups of proteins in protein-protein interaction data, (2) clusters of words and documents in word-document data and (3) in classifying an insurance data, where significant clusters are used as rules of the classifier. For mining semi-supervised subspace clusters, we propose actionable subspace clusters, which are semi-supervised subspace clusters that allow incorporation of user's knowledge, and can suggest beneficial actions to the users. We develop algorithms that use augmented Lagrangian multiplier method coupled with frequent itemset mining algorithm to efficiently mine the actionable clusters in a parameter-insensitive way. We show the effectiveness of our solution in finding actionable groups of residues in protein structural data, which are potential binding sites for drug molecules. Lastly, we present a financial data mining application on value investing, and show that our proposed algorithms outperform a famous value investment strategy in 70% of the experiments.

Enhanced subspace clustering

相似書籍