Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database

There is a huge wealth of sequence data available, for example, customer purchase histories, program execution traces, DNA, and protein sequences. Analyzing this wealth of data to mine important knowledge is certainly a worthwhile goal. In this paper, as a step forward to analyzing patterns in seque...

Full description

Saved in:
Bibliographic Details
Main Authors: DING, Bolin, LO, David, Han, Jiawei, KHOO, Siau-Cheng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2009
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/297
https://ink.library.smu.edu.sg/context/sis_research/article/1296/viewcontent/icde09.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-1296
record_format dspace
spelling sg-smu-ink.sis_research-12962011-11-02T09:37:18Z Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database DING, Bolin LO, David Han, Jiawei KHOO, Siau-Cheng There is a huge wealth of sequence data available, for example, customer purchase histories, program execution traces, DNA, and protein sequences. Analyzing this wealth of data to mine important knowledge is certainly a worthwhile goal. In this paper, as a step forward to analyzing patterns in sequences, we introduce the problem of mining closed repetitive gapped subsequences and propose efficient solutions. Given a database of sequences where each sequence is an ordered list of events, the pattern we would like to mine is called repetitive gapped subsequence, which is a subsequence (possibly with gaps between two successive events within it) of some sequences in the database. We introduce the concept of repetitive support to measure how frequently a pattern repeats in the database. Different from the sequential pattern mining problem, repetitive support captures not only repetitions of a pattern in different sequences but also the repetitions within a sequence. Given a userspecified support threshold min sup, we study finding the set of all patterns with repetitive support no less than min sup. To obtain a compact yet complete result set and improve the efficiency, we also study finding closed patterns. Efficient mining algorithms to find the complete set of desired patterns are proposed based on the idea of instance growth. Our performance study on various datasets shows the efficiency of our approach. A case study is also performed to show the utility of our approach. 2009-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/297 info:doi/10.1109/ICDE.2009.104 https://ink.library.smu.edu.sg/context/sis_research/article/1296/viewcontent/icde09.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Software Engineering
spellingShingle Software Engineering
DING, Bolin
LO, David
Han, Jiawei
KHOO, Siau-Cheng
Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database
description There is a huge wealth of sequence data available, for example, customer purchase histories, program execution traces, DNA, and protein sequences. Analyzing this wealth of data to mine important knowledge is certainly a worthwhile goal. In this paper, as a step forward to analyzing patterns in sequences, we introduce the problem of mining closed repetitive gapped subsequences and propose efficient solutions. Given a database of sequences where each sequence is an ordered list of events, the pattern we would like to mine is called repetitive gapped subsequence, which is a subsequence (possibly with gaps between two successive events within it) of some sequences in the database. We introduce the concept of repetitive support to measure how frequently a pattern repeats in the database. Different from the sequential pattern mining problem, repetitive support captures not only repetitions of a pattern in different sequences but also the repetitions within a sequence. Given a userspecified support threshold min sup, we study finding the set of all patterns with repetitive support no less than min sup. To obtain a compact yet complete result set and improve the efficiency, we also study finding closed patterns. Efficient mining algorithms to find the complete set of desired patterns are proposed based on the idea of instance growth. Our performance study on various datasets shows the efficiency of our approach. A case study is also performed to show the utility of our approach.
format text
author DING, Bolin
LO, David
Han, Jiawei
KHOO, Siau-Cheng
author_facet DING, Bolin
LO, David
Han, Jiawei
KHOO, Siau-Cheng
author_sort DING, Bolin
title Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database
title_short Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database
title_full Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database
title_fullStr Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database
title_full_unstemmed Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database
title_sort efficient mining of closed repetitive gapped subsequences from a sequence database
publisher Institutional Knowledge at Singapore Management University
publishDate 2009
url https://ink.library.smu.edu.sg/sis_research/297
https://ink.library.smu.edu.sg/context/sis_research/article/1296/viewcontent/icde09.pdf
_version_ 1770570379224416256