Density-based clustering of data streams at multiple resolutions

In data stream clustering, it is desirable to have algorithms that are able to detect clusters of arbitrary shapes, changing clusters that evolve over time, and clusters with noise. In recent years, stream data clustering algorithms are based on an online-offline approach: The online component captu...

Full description

Saved in:

Bibliographic Details
Main Author:	Wan, Li
Other Authors:	Ng Wee Keong
Format:	Student Research Poster
Language:	English
Published:	2013
Subjects:	Data Stream Data Mining
Online Access:	https://hdl.handle.net/10356/84871 http://hdl.handle.net/10220/9065
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-84871
record_format	dspace
spelling	sg-ntu-dr.10356-848712020-09-27T20:30:58Z Density-based clustering of data streams at multiple resolutions Wan, Li Ng Wee Keong School of Computer Engineering Data Stream Data Mining In data stream clustering, it is desirable to have algorithms that are able to detect clusters of arbitrary shapes, changing clusters that evolve over time, and clusters with noise. In recent years, stream data clustering algorithms are based on an online-offline approach: The online component captures synopsis information from the data stream (thus, overcoming the real-time and memory constraint issues) and the offline component generates clusters using the stored synopsis. The online-offline approach affects the overall performance of stream data clustering in various ways: (1) How easily is the synopsis information derived from stream data? (2) The complexity of data structure used to store and man age the synopsis information. (3) The frequency with which the offline component is used to generate clusters. In this project we propose an algorithm that (1) computes and updates synopsis information in constant time; (2) allows users to discover clusters at multiple resolutions; (3) determines the right time for users to generate clusters from the synopsis information; (4) generates clusters of higher purity than existing algorithms; and (5) determines the right threshold function for density-based clustering based on the fading model of stream data. To the best of our knowledge, no existing data stream algorithm has all of these features. Experimental results show that our algorithm is able to detect arbitrarily shaped evolving clusters of high quality. [3rd Award] 2013-02-01T01:00:10Z 2019-12-06T15:52:42Z 2013-02-01T01:00:10Z 2019-12-06T15:52:42Z 2008 2008 Student Research Poster Wan, L. (2008, March). Density-based clustering of data streams at multiple resolutions. Presented at Discover URECA @ NTU poster exhibition and competition, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/84871 http://hdl.handle.net/10220/9065 en © 2008 The Author(s). application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Data Stream Data Mining
spellingShingle	Data Stream Data Mining Wan, Li Density-based clustering of data streams at multiple resolutions
description	In data stream clustering, it is desirable to have algorithms that are able to detect clusters of arbitrary shapes, changing clusters that evolve over time, and clusters with noise. In recent years, stream data clustering algorithms are based on an online-offline approach: The online component captures synopsis information from the data stream (thus, overcoming the real-time and memory constraint issues) and the offline component generates clusters using the stored synopsis. The online-offline approach affects the overall performance of stream data clustering in various ways: (1) How easily is the synopsis information derived from stream data? (2) The complexity of data structure used to store and man age the synopsis information. (3) The frequency with which the offline component is used to generate clusters. In this project we propose an algorithm that (1) computes and updates synopsis information in constant time; (2) allows users to discover clusters at multiple resolutions; (3) determines the right time for users to generate clusters from the synopsis information; (4) generates clusters of higher purity than existing algorithms; and (5) determines the right threshold function for density-based clustering based on the fading model of stream data. To the best of our knowledge, no existing data stream algorithm has all of these features. Experimental results show that our algorithm is able to detect arbitrarily shaped evolving clusters of high quality. [3rd Award]
author2	Ng Wee Keong
author_facet	Ng Wee Keong Wan, Li
format	Student Research Poster
author	Wan, Li
author_sort	Wan, Li
title	Density-based clustering of data streams at multiple resolutions
title_short	Density-based clustering of data streams at multiple resolutions
title_full	Density-based clustering of data streams at multiple resolutions
title_fullStr	Density-based clustering of data streams at multiple resolutions
title_full_unstemmed	Density-based clustering of data streams at multiple resolutions
title_sort	density-based clustering of data streams at multiple resolutions
publishDate	2013
url	https://hdl.handle.net/10356/84871 http://hdl.handle.net/10220/9065
_version_	1681059543660888064

Density-based clustering of data streams at multiple resolutions

Similar Items