Concurrent data stream mining
Given the characteristics of streaming data---read-once only and infinitely streaming, it is desirable to perform multiple, concurrent types of mining on streaming data to the fullest extent permitted by resource constraints. However, to the best of our knowledge, conventional stream mining algorit...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/16935 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-16935 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-169352023-03-03T20:46:00Z Concurrent data stream mining Wang, Wenwen. Ng Wee Keong School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Information systems::Database management Given the characteristics of streaming data---read-once only and infinitely streaming, it is desirable to perform multiple, concurrent types of mining on streaming data to the fullest extent permitted by resource constraints. However, to the best of our knowledge, conventional stream mining algorithms focused on single, standalone mining. In this report, we made an attempt to achieve concurrent classification and clustering on streaming data. Our integrated framework---the MM-Stream---follows conventional online-offline approaches in stream mining. We describe our framework in general by dividing it into two components, online component and offline component: as data stream in, the online component completes all necessary process within constant time and drops data; as a mining request is issued by the user(s), the offline component performs the mining task(s) from the information collected by online component. We implemented and evaluated the algorithm. The performance evaluation showed that the performance of MM-Stream is comparable or better than existing standalone stream mining algorithms (D-Stream, On-Demand-Stream classifier). We investigated how the performance of such integrated mining compares with the purity and accuracy of standalone clustering and classification respectively. The results showed that, with concurrent mining, we can receive almost-double throughputs, without any degrade on the quality on mining results. We believe that our successful experimentation in MM-Stream's concurrent stream classification and clustering paves the way for incorporating more concurrent data mining tasks to maximize the outputs of stream data mining. Bachelor of Engineering (Computer Engineering) 2009-05-29T02:06:36Z 2009-05-29T02:06:36Z 2009 2009 Final Year Project (FYP) http://hdl.handle.net/10356/16935 en Nanyang Technological University 77 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Information systems::Database management |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Information systems::Database management Wang, Wenwen. Concurrent data stream mining |
description |
Given the characteristics of streaming data---read-once only and infinitely
streaming, it is desirable to perform multiple, concurrent types of mining on streaming data to the fullest extent permitted by resource constraints. However, to the best of our knowledge, conventional stream mining algorithms focused on single, standalone mining. In this report, we made an attempt to achieve concurrent classification and clustering on streaming data. Our integrated framework---the MM-Stream---follows conventional online-offline approaches in stream mining. We describe our framework in general by dividing it into two components, online component and offline component: as data stream in, the online component completes all necessary process within constant time and drops data; as a mining request is issued by the user(s), the offline component performs the mining task(s) from the information collected by online component. We implemented and evaluated the algorithm. The performance evaluation showed that the performance of MM-Stream is comparable or better than existing standalone stream mining algorithms (D-Stream, On-Demand-Stream classifier). We investigated how the performance of such integrated mining compares with the purity and accuracy of standalone clustering and classification respectively. The results showed that, with concurrent mining, we can receive almost-double throughputs, without any degrade on the quality on mining results. We believe that our successful experimentation in MM-Stream's concurrent stream classification and clustering paves the way for incorporating more concurrent data mining tasks to maximize the outputs of stream data mining. |
author2 |
Ng Wee Keong |
author_facet |
Ng Wee Keong Wang, Wenwen. |
format |
Final Year Project |
author |
Wang, Wenwen. |
author_sort |
Wang, Wenwen. |
title |
Concurrent data stream mining |
title_short |
Concurrent data stream mining |
title_full |
Concurrent data stream mining |
title_fullStr |
Concurrent data stream mining |
title_full_unstemmed |
Concurrent data stream mining |
title_sort |
concurrent data stream mining |
publishDate |
2009 |
url |
http://hdl.handle.net/10356/16935 |
_version_ |
1759856279784783872 |