Proximity-based k-partitions clustering with ranking for document categorization and analysis

As one of the most fundamental yet important methods of data clustering, center-based partitioning approach clusters the dataset into k subsets, each of which is represented by a centroid or medoid. In this paper, we propose a new medoid-based k-partitions approach called Clustering Around Weight...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mei, Jian-Ping, Chen, Lihui
Other Authors:	School of Electrical and Electronic Engineering
Format:	Article
Language:	English
Published:	2015
Subjects:	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Online Access:	https://hdl.handle.net/10356/103791 http://hdl.handle.net/10220/24579
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-103791
record_format	dspace
spelling	sg-ntu-dr.10356-1037912020-03-07T14:02:40Z Proximity-based k-partitions clustering with ranking for document categorization and analysis Mei, Jian-Ping Chen, Lihui School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems As one of the most fundamental yet important methods of data clustering, center-based partitioning approach clusters the dataset into k subsets, each of which is represented by a centroid or medoid. In this paper, we propose a new medoid-based k-partitions approach called Clustering Around Weighted Prototypes (CAWP), which works with a similarity matrix. In CAWP, each cluster is characterized by multiple objects with different representative weights. With this new cluster representation scheme, CAWP aims to simultaneously produce clusters of improved quality and a set of ranked representative objects for each cluster. An efficient algorithm is derived to alternatingly update the clusters and the representative weights of objects with respect to each cluster. An annealinglike optimization procedure is incorporated to alleviate the local optimum problem for better clustering results and at the same time to make the algorithm less sensitive to parameter setting. Experimental results on benchmark document datasets show that, CAWP achieves favourable effectiveness and efficiency in clustering, and also provides useful information for cluster-specified analysis Accepted version 2015-01-12T03:20:28Z 2019-12-06T21:20:20Z 2015-01-12T03:20:28Z 2019-12-06T21:20:20Z 2014 2014 Journal Article Mei, J.-P., & Chen, L. (2014). Proximity-based k-partitions clustering with ranking for document categorization and analysis. Expert systems with applications, 41(16), 7095-7105. 0957-4174 https://hdl.handle.net/10356/103791 http://hdl.handle.net/10220/24579 10.1016/j.eswa.2014.06.016 en Expert systems with applications © 2014 Elsevier Ltd. This is the author created version of a work that has been peer reviewed and accepted for publication by Expert Systems with Applications, Elsevier Ltd. It incorporates referee’s comments but changes resulting from the publishing process, such as copyediting, structural formatting, may not be reflected in this document. The published version is available at: [http://dx.doi.org/10.1016/j.eswa.2014.06.016]. 34 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Mei, Jian-Ping Chen, Lihui Proximity-based k-partitions clustering with ranking for document categorization and analysis
description	As one of the most fundamental yet important methods of data clustering, center-based partitioning approach clusters the dataset into k subsets, each of which is represented by a centroid or medoid. In this paper, we propose a new medoid-based k-partitions approach called Clustering Around Weighted Prototypes (CAWP), which works with a similarity matrix. In CAWP, each cluster is characterized by multiple objects with different representative weights. With this new cluster representation scheme, CAWP aims to simultaneously produce clusters of improved quality and a set of ranked representative objects for each cluster. An efficient algorithm is derived to alternatingly update the clusters and the representative weights of objects with respect to each cluster. An annealinglike optimization procedure is incorporated to alleviate the local optimum problem for better clustering results and at the same time to make the algorithm less sensitive to parameter setting. Experimental results on benchmark document datasets show that, CAWP achieves favourable effectiveness and efficiency in clustering, and also provides useful information for cluster-specified analysis
author2	School of Electrical and Electronic Engineering
author_facet	School of Electrical and Electronic Engineering Mei, Jian-Ping Chen, Lihui
format	Article
author	Mei, Jian-Ping Chen, Lihui
author_sort	Mei, Jian-Ping
title	Proximity-based k-partitions clustering with ranking for document categorization and analysis
title_short	Proximity-based k-partitions clustering with ranking for document categorization and analysis
title_full	Proximity-based k-partitions clustering with ranking for document categorization and analysis
title_fullStr	Proximity-based k-partitions clustering with ranking for document categorization and analysis
title_full_unstemmed	Proximity-based k-partitions clustering with ranking for document categorization and analysis
title_sort	proximity-based k-partitions clustering with ranking for document categorization and analysis
publishDate	2015
url	https://hdl.handle.net/10356/103791 http://hdl.handle.net/10220/24579
_version_	1681045575087161344

Proximity-based k-partitions clustering with ranking for document categorization and analysis

Similar Items