Entropy Weighting K-Means for high-dimensional data analysis
Entropy Weighting K-Means (EWKM) clustering is a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high dimensional data, clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimen...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2010
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/39388 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-39388 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-393882023-07-07T17:15:54Z Entropy Weighting K-Means for high-dimensional data analysis Leonel Rahman. Chen Lihui School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Entropy Weighting K-Means (EWKM) clustering is a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high dimensional data, clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k-means clustering process. An additional step is added to the k-means clustering process to automatically compute the weights of all dimensions in each cluster. The experiments on both synthetic and real data have shown that the new algorithm can generate better clustering results than other subspace clustering algorithms. In this project the new algorithm is implemented in Java and is also scalable to large data sets.[4] However, in L. Jing’s paper, it computes Euclidian distance as the similarity measurement between any two data points and only test on low dimensional data (2-D). In this project, firstly this algorithm was applied directly to test the data set and then we modified the original EWKM expressions to a revised version by introducing the concept of cosine similarity measure which gives better accuracy to the clustering results. Entropy, purity and NMI score values are calculated and applied as quantitative evaluation measures of the experiment results. We analyze the parameters, provide a further study on its advantage, and compare the effectiveness with simple K-Means and original EWKM as well. Bachelor of Engineering 2010-05-21T07:56:07Z 2010-05-21T07:56:07Z 2010 2010 Final Year Project (FYP) http://hdl.handle.net/10356/39388 en Nanyang Technological University 47 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Leonel Rahman. Entropy Weighting K-Means for high-dimensional data analysis |
description |
Entropy Weighting K-Means (EWKM) clustering is a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high dimensional data, clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k-means clustering process. An additional step is added to the k-means clustering process to automatically compute the weights of all dimensions in each cluster. The experiments on both synthetic and real data have shown that the new algorithm can generate better clustering results than other subspace clustering algorithms. In this project the new algorithm is implemented in Java and is also scalable to large data sets.[4] However, in L. Jing’s paper, it computes Euclidian distance as the similarity measurement between any two data points and only test on low dimensional data (2-D). In this project, firstly this algorithm was applied directly to test the data set and then we modified the original EWKM expressions to a revised version by introducing the concept of cosine similarity measure which gives better accuracy to the clustering results. Entropy, purity and NMI score values are calculated and applied as quantitative evaluation measures of the experiment results. We analyze the parameters, provide a further study on its advantage, and compare the effectiveness with simple K-Means and original EWKM as well. |
author2 |
Chen Lihui |
author_facet |
Chen Lihui Leonel Rahman. |
format |
Final Year Project |
author |
Leonel Rahman. |
author_sort |
Leonel Rahman. |
title |
Entropy Weighting K-Means for high-dimensional data analysis |
title_short |
Entropy Weighting K-Means for high-dimensional data analysis |
title_full |
Entropy Weighting K-Means for high-dimensional data analysis |
title_fullStr |
Entropy Weighting K-Means for high-dimensional data analysis |
title_full_unstemmed |
Entropy Weighting K-Means for high-dimensional data analysis |
title_sort |
entropy weighting k-means for high-dimensional data analysis |
publishDate |
2010 |
url |
http://hdl.handle.net/10356/39388 |
_version_ |
1772827754457202688 |