A method for k-means-like clustering of categorical data

© 2019, Springer-Verlag GmbH Germany, part of Springer Nature. Despite recent efforts, the challenge in clustering categorical and mixed data in the context of big data still remains due to the lack of inherently meaningful measure of similarity between categorical objects and the high computational...

全面介紹

Saved in:

書目詳細資料
Main Authors:	Thu Hien Thi Nguyen, Duy Tai Dinh, Songsak Sriboonchitta, Van Nam Huynh
格式:	雜誌
出版:	2020
主題:	Computer Science
在線閱讀:	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85073982951&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/67757
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

實物特徵
總結:	© 2019, Springer-Verlag GmbH Germany, part of Springer Nature. Despite recent efforts, the challenge in clustering categorical and mixed data in the context of big data still remains due to the lack of inherently meaningful measure of similarity between categorical objects and the high computational complexity of existing clustering techniques. While k-means method is well known for its efficiency in clustering large data sets, working only on numerical data prohibits it from being applied for clustering categorical data. In this paper, we aim to develop a novel extension of k-means method for clustering categorical data, making use of an information theoretic-based dissimilarity measure and a kernel-based method for representation of cluster means for categorical objects. Such an approach allows us to formulate the problem of clustering categorical data in the fashion similar to k-means clustering, while a kernel-based definition of centers also provides an interpretation of cluster means being consistent with the statistical interpretation of the cluster means for numerical data. In order to demonstrate the performance of the new clustering method, a series of experiments on real datasets from UCI Machine Learning Repository are conducted and the obtained results are compared with several previously developed algorithms for clustering categorical data.

A method for k-means-like clustering of categorical data

相似書籍