Proximity-based k-partitions clustering with ranking for document categorization and analysis
As one of the most fundamental yet important methods of data clustering, center-based partitioning approach clusters the dataset into k subsets, each of which is represented by a centroid or medoid. In this paper, we propose a new medoid-based k-partitions approach called Clustering Around Weight...
Saved in:
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/103791 http://hdl.handle.net/10220/24579 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | As one of the most fundamental yet important methods of data clustering,
center-based partitioning approach clusters the dataset into k subsets, each of
which is represented by a centroid or medoid. In this paper, we propose a new
medoid-based k-partitions approach called Clustering Around Weighted Prototypes
(CAWP), which works with a similarity matrix. In CAWP, each cluster
is characterized by multiple objects with different representative weights. With
this new cluster representation scheme, CAWP aims to simultaneously produce
clusters of improved quality and a set of ranked representative objects for each
cluster. An efficient algorithm is derived to alternatingly update the clusters and
the representative weights of objects with respect to each cluster. An annealinglike
optimization procedure is incorporated to alleviate the local optimum problem
for better clustering results and at the same time to make the algorithm less
sensitive to parameter setting. Experimental results on benchmark document
datasets show that, CAWP achieves favourable effectiveness and efficiency in
clustering, and also provides useful information for cluster-specified analysis |
---|