Mining Closed Discriminative Dyadic Sequential Patterns

A lot of data are in sequential formats. In this study, we are interested in sequential data that goes in pairs. There are many interesting datasets in this format coming from various domains including parallel textual corpora, duplicate bug reports, and other pairs of related sequences of events. O...

Full description

Saved in:

Bibliographic Details
Main Authors:	LO, David, CHENG, Hong, Lucia, -
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2011
Subjects:	Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/1358 http://dx.doi.org/10.1145/1951365.1951371
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-2357
record_format	dspace
spelling	sg-smu-ink.sis_research-23572011-05-18T09:44:50Z Mining Closed Discriminative Dyadic Sequential Patterns LO, David CHENG, Hong Lucia, - A lot of data are in sequential formats. In this study, we are interested in sequential data that goes in pairs. There are many interesting datasets in this format coming from various domains including parallel textual corpora, duplicate bug reports, and other pairs of related sequences of events. Our goal is to mine a set of closed discriminative dyadic sequential patterns from a database of sequence pairs each belonging to one of the two classes +ve and -ve. These dyadic sequential patterns characterize the discriminating facets contrasting the two classes. They are potentially good features to be used for the classification of dyadic sequential data. They can be used to characterize and flag correct and incorrect translations from parallel textual corpora, automate the manual and time consuming duplicate bug report detection process, etc. We provide a solution of this new problem by proposing new search space traversal strategy, projected database structure, pruning properties, and novel mining algorithms. To demonstrate the scalability and utility of our solution, we have experimented with both synthetic and real datasets. Experiment results show that our solution is scalable. Mined patterns are also able to improve the accuracy of one possible downstream application, namely the detection of duplicate bug reports using pattern-based classification. 2011-03-01T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/1358 info:doi/10.1145/1951365.1951371 http://dx.doi.org/10.1145/1951365.1951371 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Software Engineering
spellingShingle	Software Engineering LO, David CHENG, Hong Lucia, - Mining Closed Discriminative Dyadic Sequential Patterns
description	A lot of data are in sequential formats. In this study, we are interested in sequential data that goes in pairs. There are many interesting datasets in this format coming from various domains including parallel textual corpora, duplicate bug reports, and other pairs of related sequences of events. Our goal is to mine a set of closed discriminative dyadic sequential patterns from a database of sequence pairs each belonging to one of the two classes +ve and -ve. These dyadic sequential patterns characterize the discriminating facets contrasting the two classes. They are potentially good features to be used for the classification of dyadic sequential data. They can be used to characterize and flag correct and incorrect translations from parallel textual corpora, automate the manual and time consuming duplicate bug report detection process, etc. We provide a solution of this new problem by proposing new search space traversal strategy, projected database structure, pruning properties, and novel mining algorithms. To demonstrate the scalability and utility of our solution, we have experimented with both synthetic and real datasets. Experiment results show that our solution is scalable. Mined patterns are also able to improve the accuracy of one possible downstream application, namely the detection of duplicate bug reports using pattern-based classification.
format	text
author	LO, David CHENG, Hong Lucia, -
author_facet	LO, David CHENG, Hong Lucia, -
author_sort	LO, David
title	Mining Closed Discriminative Dyadic Sequential Patterns
title_short	Mining Closed Discriminative Dyadic Sequential Patterns
title_full	Mining Closed Discriminative Dyadic Sequential Patterns
title_fullStr	Mining Closed Discriminative Dyadic Sequential Patterns
title_full_unstemmed	Mining Closed Discriminative Dyadic Sequential Patterns
title_sort	mining closed discriminative dyadic sequential patterns
publisher	Institutional Knowledge at Singapore Management University
publishDate	2011
url	https://ink.library.smu.edu.sg/sis_research/1358 http://dx.doi.org/10.1145/1951365.1951371
_version_	1770570992704290816

Mining Closed Discriminative Dyadic Sequential Patterns

Similar Items