Fast Filter-and-Refine Algorithms for Subsequence Selection

Large sequence databases, such as protein, DNA and gene sequences in biology, are becoming increasingly common. An important operation on a sequence database is approximate subsequence matching, where all subsequences that are within some distance from a given query string are retrieved. This paper...

Full description

Saved in:
Bibliographic Details
Main Authors: OOI, Beng-Chin, PANG, Hwee Hwa, WANG, Hao, WONG, Limsoon, YU, Cui
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2002
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/1144
https://ink.library.smu.edu.sg/context/sis_research/article/2143/viewcontent/FastFilterandRefine_edited_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-2143
record_format dspace
spelling sg-smu-ink.sis_research-21432017-07-11T08:25:26Z Fast Filter-and-Refine Algorithms for Subsequence Selection OOI, Beng-Chin PANG, Hwee Hwa WANG, Hao WONG, Limsoon YU, Cui Large sequence databases, such as protein, DNA and gene sequences in biology, are becoming increasingly common. An important operation on a sequence database is approximate subsequence matching, where all subsequences that are within some distance from a given query string are retrieved. This paper proposes a filter-and-refine algorithm that enables efficient approximate subsequence matching in large DNA sequence databases. It employs a bitmap indexing structure to condense and encode each data sequence into a shorter index sequence. During query processing, the bitmap index is used to filter out most of the irrelevant subsequences, and false positives are removed in the final refinement step. Analytical and experimental studies show that the proposed strategy is capable of reducing response time substantially while incurring only a small space overhead. 2002-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/1144 info:doi/10.1109/IDEAS.2002.1029677 https://ink.library.smu.edu.sg/context/sis_research/article/2143/viewcontent/FastFilterandRefine_edited_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University DNA sequences approximate subsequence matching biology bitmap indexing structure data sequence condensing data sequence encoding false positive removal fast filter-and-refine algorithms gene sequences index sequence large DNA sequence databases large sequence databases protein sequences query processing query string response time small space overhead subsequence selection Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic DNA sequences
approximate subsequence matching
biology
bitmap indexing structure
data sequence condensing
data sequence encoding
false positive removal
fast filter-and-refine algorithms
gene sequences
index sequence
large DNA sequence databases
large sequence databases
protein sequences
query processing
query string
response time
small space overhead
subsequence selection
Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle DNA sequences
approximate subsequence matching
biology
bitmap indexing structure
data sequence condensing
data sequence encoding
false positive removal
fast filter-and-refine algorithms
gene sequences
index sequence
large DNA sequence databases
large sequence databases
protein sequences
query processing
query string
response time
small space overhead
subsequence selection
Databases and Information Systems
Numerical Analysis and Scientific Computing
OOI, Beng-Chin
PANG, Hwee Hwa
WANG, Hao
WONG, Limsoon
YU, Cui
Fast Filter-and-Refine Algorithms for Subsequence Selection
description Large sequence databases, such as protein, DNA and gene sequences in biology, are becoming increasingly common. An important operation on a sequence database is approximate subsequence matching, where all subsequences that are within some distance from a given query string are retrieved. This paper proposes a filter-and-refine algorithm that enables efficient approximate subsequence matching in large DNA sequence databases. It employs a bitmap indexing structure to condense and encode each data sequence into a shorter index sequence. During query processing, the bitmap index is used to filter out most of the irrelevant subsequences, and false positives are removed in the final refinement step. Analytical and experimental studies show that the proposed strategy is capable of reducing response time substantially while incurring only a small space overhead.
format text
author OOI, Beng-Chin
PANG, Hwee Hwa
WANG, Hao
WONG, Limsoon
YU, Cui
author_facet OOI, Beng-Chin
PANG, Hwee Hwa
WANG, Hao
WONG, Limsoon
YU, Cui
author_sort OOI, Beng-Chin
title Fast Filter-and-Refine Algorithms for Subsequence Selection
title_short Fast Filter-and-Refine Algorithms for Subsequence Selection
title_full Fast Filter-and-Refine Algorithms for Subsequence Selection
title_fullStr Fast Filter-and-Refine Algorithms for Subsequence Selection
title_full_unstemmed Fast Filter-and-Refine Algorithms for Subsequence Selection
title_sort fast filter-and-refine algorithms for subsequence selection
publisher Institutional Knowledge at Singapore Management University
publishDate 2002
url https://ink.library.smu.edu.sg/sis_research/1144
https://ink.library.smu.edu.sg/context/sis_research/article/2143/viewcontent/FastFilterandRefine_edited_.pdf
_version_ 1770570869651800064