Fast Streaming k-Means Clustering with Coreset Caching

IEEE We present new algorithms for k-means clustering on a data stream with a focus on providing fast responses to clustering queries. Compared to the state-of-the-art, our algorithms provide substantial improvements in the query time for cluster centers while retaining the desirable properties of p...

Full description

Saved in:
Bibliographic Details
Main Authors: Yu Zhang, Kanat Tangwongsan, Srikanta Tirthapura
Other Authors: Mahidol University
Format: Article
Published: 2020
Subjects:
Online Access:https://repository.li.mahidol.ac.th/handle/123456789/59046
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Mahidol University
id th-mahidol.59046
record_format dspace
spelling th-mahidol.590462020-10-05T11:42:58Z Fast Streaming k-Means Clustering with Coreset Caching Yu Zhang Kanat Tangwongsan Srikanta Tirthapura Mahidol University Iowa State University Computer Science IEEE We present new algorithms for k-means clustering on a data stream with a focus on providing fast responses to clustering queries. Compared to the state-of-the-art, our algorithms provide substantial improvements in the query time for cluster centers while retaining the desirable properties of provably small approximation error and low space usage. Our proposed clustering algorithms systematically reuse the "coresets" (summaries of data) computed for recent queries in answering the current clustering query, a novel technique which we refer to as coreset caching. We also present an algorithm called OnlineCC that integrates the coreset caching idea with a simple sequential streaming k-means algorithm. In practice, OnlineCC algorithm can provide constant query time. We present both theoretical analysis and detailed experiments demonstrating the correctness, accuracy, and efficiency of all our proposed clustering algorithms. 2020-10-05T04:42:58Z 2020-10-05T04:42:58Z 2020-01-01 Article IEEE Transactions on Knowledge and Data Engineering. (2020) 10.1109/TKDE.2020.3018744 15582191 10414347 2-s2.0-85090466944 https://repository.li.mahidol.ac.th/handle/123456789/59046 Mahidol University SCOPUS https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85090466944&origin=inward
institution Mahidol University
building Mahidol University Library
continent Asia
country Thailand
Thailand
content_provider Mahidol University Library
collection Mahidol University Institutional Repository
topic Computer Science
spellingShingle Computer Science
Yu Zhang
Kanat Tangwongsan
Srikanta Tirthapura
Fast Streaming k-Means Clustering with Coreset Caching
description IEEE We present new algorithms for k-means clustering on a data stream with a focus on providing fast responses to clustering queries. Compared to the state-of-the-art, our algorithms provide substantial improvements in the query time for cluster centers while retaining the desirable properties of provably small approximation error and low space usage. Our proposed clustering algorithms systematically reuse the "coresets" (summaries of data) computed for recent queries in answering the current clustering query, a novel technique which we refer to as coreset caching. We also present an algorithm called OnlineCC that integrates the coreset caching idea with a simple sequential streaming k-means algorithm. In practice, OnlineCC algorithm can provide constant query time. We present both theoretical analysis and detailed experiments demonstrating the correctness, accuracy, and efficiency of all our proposed clustering algorithms.
author2 Mahidol University
author_facet Mahidol University
Yu Zhang
Kanat Tangwongsan
Srikanta Tirthapura
format Article
author Yu Zhang
Kanat Tangwongsan
Srikanta Tirthapura
author_sort Yu Zhang
title Fast Streaming k-Means Clustering with Coreset Caching
title_short Fast Streaming k-Means Clustering with Coreset Caching
title_full Fast Streaming k-Means Clustering with Coreset Caching
title_fullStr Fast Streaming k-Means Clustering with Coreset Caching
title_full_unstemmed Fast Streaming k-Means Clustering with Coreset Caching
title_sort fast streaming k-means clustering with coreset caching
publishDate 2020
url https://repository.li.mahidol.ac.th/handle/123456789/59046
_version_ 1763490658339782656