Data stream mining

Data streaming is one area of data mining that has been studied extensively. One problem of data streaming is to detect noise and random shapes when clustering, where basic K-Means usually fail. Some researchers suggested density based clustering according to a decay function; one typical example is...

Full description

Saved in:
Bibliographic Details
Main Author: Huang, Lelun.
Other Authors: Ng Wee Keong
Format: Final Year Project
Language:English
Published: 2010
Subjects:
Online Access:http://hdl.handle.net/10356/36246
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-36246
record_format dspace
spelling sg-ntu-dr.10356-362462023-03-03T20:59:03Z Data stream mining Huang, Lelun. Ng Wee Keong School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Information systems::Database management Data streaming is one area of data mining that has been studied extensively. One problem of data streaming is to detect noise and random shapes when clustering, where basic K-Means usually fail. Some researchers suggested density based clustering according to a decay function; one typical example is D-Stream. However, its universal decay factor and cluster on a fixed interval do not achieve optimal efficiency regarding to space and time complexity. In this report, we made an attempt to improve both space and time complexity of D-Stream. Our integrated work DCC-Stream follows conventional online-offline approach in stream mining. We describe our algorithm as two parts: online and offline parts. Online part accumulates historical data as synopsis information and makes use of two sentinels to detect whether offline parts should be invoked. Offline part contains two separate parts, one is responsible for updating density, the other is for clustering. The experimental evaluation shows that our algorithm achieves both significant improvements on time and space complexity. The results show time usage is greatly reduced while maintain similar purity. In addition, the algorithm also achieves better space usage. Bachelor of Engineering (Computer Engineering) 2010-04-28T08:38:21Z 2010-04-28T08:38:21Z 2010 2010 Final Year Project (FYP) http://hdl.handle.net/10356/36246 en Nanyang Technological University 48 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Database management
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Database management
Huang, Lelun.
Data stream mining
description Data streaming is one area of data mining that has been studied extensively. One problem of data streaming is to detect noise and random shapes when clustering, where basic K-Means usually fail. Some researchers suggested density based clustering according to a decay function; one typical example is D-Stream. However, its universal decay factor and cluster on a fixed interval do not achieve optimal efficiency regarding to space and time complexity. In this report, we made an attempt to improve both space and time complexity of D-Stream. Our integrated work DCC-Stream follows conventional online-offline approach in stream mining. We describe our algorithm as two parts: online and offline parts. Online part accumulates historical data as synopsis information and makes use of two sentinels to detect whether offline parts should be invoked. Offline part contains two separate parts, one is responsible for updating density, the other is for clustering. The experimental evaluation shows that our algorithm achieves both significant improvements on time and space complexity. The results show time usage is greatly reduced while maintain similar purity. In addition, the algorithm also achieves better space usage.
author2 Ng Wee Keong
author_facet Ng Wee Keong
Huang, Lelun.
format Final Year Project
author Huang, Lelun.
author_sort Huang, Lelun.
title Data stream mining
title_short Data stream mining
title_full Data stream mining
title_fullStr Data stream mining
title_full_unstemmed Data stream mining
title_sort data stream mining
publishDate 2010
url http://hdl.handle.net/10356/36246
_version_ 1759856917123956736