Entropy Weighting K-Means for high-dimensional data analysis

Entropy Weighting K-Means (EWKM) clustering is a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high dimensional data, clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimen...

Full description

Saved in:

Bibliographic Details
Main Author:	Leonel Rahman.
Other Authors:	Chen Lihui
Format:	Final Year Project
Language:	English
Published:	2010
Subjects:	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Online Access:	http://hdl.handle.net/10356/39388
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-39388
record_format	dspace
spelling	sg-ntu-dr.10356-393882023-07-07T17:15:54Z Entropy Weighting K-Means for high-dimensional data analysis Leonel Rahman. Chen Lihui School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Entropy Weighting K-Means (EWKM) clustering is a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high dimensional data, clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k-means clustering process. An additional step is added to the k-means clustering process to automatically compute the weights of all dimensions in each cluster. The experiments on both synthetic and real data have shown that the new algorithm can generate better clustering results than other subspace clustering algorithms. In this project the new algorithm is implemented in Java and is also scalable to large data sets.[4] However, in L. Jing’s paper, it computes Euclidian distance as the similarity measurement between any two data points and only test on low dimensional data (2-D). In this project, firstly this algorithm was applied directly to test the data set and then we modified the original EWKM expressions to a revised version by introducing the concept of cosine similarity measure which gives better accuracy to the clustering results. Entropy, purity and NMI score values are calculated and applied as quantitative evaluation measures of the experiment results. We analyze the parameters, provide a further study on its advantage, and compare the effectiveness with simple K-Means and original EWKM as well. Bachelor of Engineering 2010-05-21T07:56:07Z 2010-05-21T07:56:07Z 2010 2010 Final Year Project (FYP) http://hdl.handle.net/10356/39388 en Nanyang Technological University 47 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Leonel Rahman. Entropy Weighting K-Means for high-dimensional data analysis
description	Entropy Weighting K-Means (EWKM) clustering is a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high dimensional data, clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k-means clustering process. An additional step is added to the k-means clustering process to automatically compute the weights of all dimensions in each cluster. The experiments on both synthetic and real data have shown that the new algorithm can generate better clustering results than other subspace clustering algorithms. In this project the new algorithm is implemented in Java and is also scalable to large data sets.[4] However, in L. Jing’s paper, it computes Euclidian distance as the similarity measurement between any two data points and only test on low dimensional data (2-D). In this project, firstly this algorithm was applied directly to test the data set and then we modified the original EWKM expressions to a revised version by introducing the concept of cosine similarity measure which gives better accuracy to the clustering results. Entropy, purity and NMI score values are calculated and applied as quantitative evaluation measures of the experiment results. We analyze the parameters, provide a further study on its advantage, and compare the effectiveness with simple K-Means and original EWKM as well.
author2	Chen Lihui
author_facet	Chen Lihui Leonel Rahman.
format	Final Year Project
author	Leonel Rahman.
author_sort	Leonel Rahman.
title	Entropy Weighting K-Means for high-dimensional data analysis
title_short	Entropy Weighting K-Means for high-dimensional data analysis
title_full	Entropy Weighting K-Means for high-dimensional data analysis
title_fullStr	Entropy Weighting K-Means for high-dimensional data analysis
title_full_unstemmed	Entropy Weighting K-Means for high-dimensional data analysis
title_sort	entropy weighting k-means for high-dimensional data analysis
publishDate	2010
url	http://hdl.handle.net/10356/39388
_version_	1772827754457202688

Entropy Weighting K-Means for high-dimensional data analysis

Similar Items