Entropy Weighting K-Means for high-dimensional data analysis

Entropy Weighting K-Means (EWKM) clustering is a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high dimensional data, clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimen...

Full description

Saved in:
Bibliographic Details
Main Author: Leonel Rahman.
Other Authors: Chen Lihui
Format: Final Year Project
Language:English
Published: 2010
Subjects:
Online Access:http://hdl.handle.net/10356/39388
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-39388
record_format dspace
spelling sg-ntu-dr.10356-393882023-07-07T17:15:54Z Entropy Weighting K-Means for high-dimensional data analysis Leonel Rahman. Chen Lihui School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Entropy Weighting K-Means (EWKM) clustering is a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high dimensional data, clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k-means clustering process. An additional step is added to the k-means clustering process to automatically compute the weights of all dimensions in each cluster. The experiments on both synthetic and real data have shown that the new algorithm can generate better clustering results than other subspace clustering algorithms. In this project the new algorithm is implemented in Java and is also scalable to large data sets.[4] However, in L. Jing’s paper, it computes Euclidian distance as the similarity measurement between any two data points and only test on low dimensional data (2-D). In this project, firstly this algorithm was applied directly to test the data set and then we modified the original EWKM expressions to a revised version by introducing the concept of cosine similarity measure which gives better accuracy to the clustering results. Entropy, purity and NMI score values are calculated and applied as quantitative evaluation measures of the experiment results. We analyze the parameters, provide a further study on its advantage, and compare the effectiveness with simple K-Means and original EWKM as well. Bachelor of Engineering 2010-05-21T07:56:07Z 2010-05-21T07:56:07Z 2010 2010 Final Year Project (FYP) http://hdl.handle.net/10356/39388 en Nanyang Technological University 47 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Leonel Rahman.
Entropy Weighting K-Means for high-dimensional data analysis
description Entropy Weighting K-Means (EWKM) clustering is a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high dimensional data, clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k-means clustering process. An additional step is added to the k-means clustering process to automatically compute the weights of all dimensions in each cluster. The experiments on both synthetic and real data have shown that the new algorithm can generate better clustering results than other subspace clustering algorithms. In this project the new algorithm is implemented in Java and is also scalable to large data sets.[4] However, in L. Jing’s paper, it computes Euclidian distance as the similarity measurement between any two data points and only test on low dimensional data (2-D). In this project, firstly this algorithm was applied directly to test the data set and then we modified the original EWKM expressions to a revised version by introducing the concept of cosine similarity measure which gives better accuracy to the clustering results. Entropy, purity and NMI score values are calculated and applied as quantitative evaluation measures of the experiment results. We analyze the parameters, provide a further study on its advantage, and compare the effectiveness with simple K-Means and original EWKM as well.
author2 Chen Lihui
author_facet Chen Lihui
Leonel Rahman.
format Final Year Project
author Leonel Rahman.
author_sort Leonel Rahman.
title Entropy Weighting K-Means for high-dimensional data analysis
title_short Entropy Weighting K-Means for high-dimensional data analysis
title_full Entropy Weighting K-Means for high-dimensional data analysis
title_fullStr Entropy Weighting K-Means for high-dimensional data analysis
title_full_unstemmed Entropy Weighting K-Means for high-dimensional data analysis
title_sort entropy weighting k-means for high-dimensional data analysis
publishDate 2010
url http://hdl.handle.net/10356/39388
_version_ 1772827754457202688