parator Database and SPM-Tree Framework for Mining Sequential Patterns Using PrefixSpan with Pseudoprojection
Sequential pattern mining is a new branch of data mining science that solves intertransaction pattern mining problems. Efficiency and scalability on mining complete set of patterns is the challenge of sequential pattern mining. A comprehensive performance study has been reported that PrefixSpan,...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2008
|
Subjects: | |
Online Access: | http://utpedia.utp.edu.my/8549/1/2008%20Master%20-%20Separator%20Database%20and%20SPM-Tree%20Framework%20for%20Mining%20Sequential%20Patterns%20using%20Pref.pdf http://utpedia.utp.edu.my/8549/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Petronas |
Language: | English |
id |
my-utp-utpedia.8549 |
---|---|
record_format |
eprints |
spelling |
my-utp-utpedia.85492017-01-25T09:45:05Z http://utpedia.utp.edu.my/8549/ parator Database and SPM-Tree Framework for Mining Sequential Patterns Using PrefixSpan with Pseudoprojection Saputra, Dhany ZA Information resources Sequential pattern mining is a new branch of data mining science that solves intertransaction pattern mining problems. Efficiency and scalability on mining complete set of patterns is the challenge of sequential pattern mining. A comprehensive performance study has been reported that PrefixSpan, one of the sequential pattern mining algorithms, outperforms GSP, SPADE, as well as FreeSpan in most cases, and PrefixSpan integrated with pseudoprojection technique is the fastest among those tested algorithms. Nevertheless, Pseudoprojection technique, which requires maintaining and visiting the in-memory sequence database frequently until all patterns are found, consumes a considerable amount of memory space and induces the algorithm to undertake many redundant and unnecessary checks to this copy of original database into memory when the candidate patterns are examined. Moreover, improper management of intermediate databases may adversely affect the execution time and memory utilization. In the present work, Separator Database is proposed to improve PrefixSpan with pseudoprojection through early removal of uneconomical in-memory sequence database, whilst SPM-Tree Framework is proposed to build the intermediate databases. By means of procedures for building index set of longer patterns using Separator Database, some procedure in accordance to in-memory sequence database can be removed, thus most of the memory space can be released and some obliteration of redundant checks to in-memory sequence database reduce the execution time. By storing intermediate databases into SPM-Tree Framework, the sequence database can be stored into memory and the index set may be built. Using Java as a case study, a series of experiment was conducted to select a suitable API class named Collections for this framework. The experimental results show that Separator Database always improves, exponentially in some cases, PrefixSpan with pseudoprojection. The results also show that in Java, ArrayList is the most suitable choice for storing Object and ArrayintList is the most suitable choice for storing integer data. This novel approach of integrating Separator Database and SPM-Tree Framework using these choices of Java Collections outperforms PrefixSpan with pseudoprojection in terms of CPU performance and memory utilization. Future research includes exploring the use of Separator Database in PrefixSpan with pseudoprojection to improve mining generalized sequential patterns, particularly in handling mining constrained sequential patterns. 2008-04 Thesis NonPeerReviewed application/pdf en http://utpedia.utp.edu.my/8549/1/2008%20Master%20-%20Separator%20Database%20and%20SPM-Tree%20Framework%20for%20Mining%20Sequential%20Patterns%20using%20Pref.pdf Saputra, Dhany (2008) parator Database and SPM-Tree Framework for Mining Sequential Patterns Using PrefixSpan with Pseudoprojection. Masters thesis, Universiti Teknologi Petronas. |
institution |
Universiti Teknologi Petronas |
building |
UTP Resource Centre |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Petronas |
content_source |
UTP Electronic and Digitized Intellectual Asset |
url_provider |
http://utpedia.utp.edu.my/ |
language |
English |
topic |
ZA Information resources |
spellingShingle |
ZA Information resources Saputra, Dhany parator Database and SPM-Tree Framework for Mining Sequential Patterns Using PrefixSpan with Pseudoprojection |
description |
Sequential pattern mining is a new branch of data mining science that solves intertransaction
pattern mining problems. Efficiency and scalability on mining complete
set of patterns is the challenge of sequential pattern mining. A comprehensive
performance study has been reported that PrefixSpan, one of the sequential pattern
mining algorithms, outperforms GSP, SPADE, as well as FreeSpan in most cases, and
PrefixSpan integrated with pseudoprojection technique is the fastest among those
tested algorithms. Nevertheless, Pseudoprojection technique, which requires
maintaining and visiting the in-memory sequence database frequently until all patterns
are found, consumes a considerable amount of memory space and induces the
algorithm to undertake many redundant and unnecessary checks to this copy of
original database into memory when the candidate patterns are examined. Moreover,
improper management of intermediate databases may adversely affect the execution
time and memory utilization. In the present work, Separator Database is proposed to
improve PrefixSpan with pseudoprojection through early removal of uneconomical
in-memory sequence database, whilst SPM-Tree Framework is proposed to build the
intermediate databases. By means of procedures for building index set of longer
patterns using Separator Database, some procedure in accordance to in-memory
sequence database can be removed, thus most of the memory space can be released
and some obliteration of redundant checks to in-memory sequence database reduce
the execution time. By storing intermediate databases into SPM-Tree Framework, the
sequence database can be stored into memory and the index set may be built. Using
Java as a case study, a series of experiment was conducted to select a suitable API
class named Collections for this framework. The experimental results show that
Separator Database always improves, exponentially in some cases, PrefixSpan with
pseudoprojection. The results also show that in Java, ArrayList is the most
suitable choice for storing Object and ArrayintList is the most suitable choice for storing integer data. This novel approach of integrating Separator Database and
SPM-Tree Framework using these choices of Java Collections outperforms
PrefixSpan with pseudoprojection in terms of CPU performance and memory
utilization. Future research includes exploring the use of Separator Database in
PrefixSpan with pseudoprojection to improve mining generalized sequential patterns,
particularly in handling mining constrained sequential patterns. |
format |
Thesis |
author |
Saputra, Dhany |
author_facet |
Saputra, Dhany |
author_sort |
Saputra, Dhany |
title |
parator Database and SPM-Tree Framework for Mining Sequential Patterns
Using PrefixSpan with Pseudoprojection |
title_short |
parator Database and SPM-Tree Framework for Mining Sequential Patterns
Using PrefixSpan with Pseudoprojection |
title_full |
parator Database and SPM-Tree Framework for Mining Sequential Patterns
Using PrefixSpan with Pseudoprojection |
title_fullStr |
parator Database and SPM-Tree Framework for Mining Sequential Patterns
Using PrefixSpan with Pseudoprojection |
title_full_unstemmed |
parator Database and SPM-Tree Framework for Mining Sequential Patterns
Using PrefixSpan with Pseudoprojection |
title_sort |
parator database and spm-tree framework for mining sequential patterns
using prefixspan with pseudoprojection |
publishDate |
2008 |
url |
http://utpedia.utp.edu.my/8549/1/2008%20Master%20-%20Separator%20Database%20and%20SPM-Tree%20Framework%20for%20Mining%20Sequential%20Patterns%20using%20Pref.pdf http://utpedia.utp.edu.my/8549/ |
_version_ |
1739831585509736448 |