Fast Filter-and-Refine Algorithms for Subsequence Selection
Large sequence databases, such as protein, DNA and gene sequences in biology, are becoming increasingly common. An important operation on a sequence database is approximate subsequence matching, where all subsequences that are within some distance from a given query string are retrieved. This paper...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2002
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/1144 https://ink.library.smu.edu.sg/context/sis_research/article/2143/viewcontent/FastFilterandRefine_edited_.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-2143 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-21432017-07-11T08:25:26Z Fast Filter-and-Refine Algorithms for Subsequence Selection OOI, Beng-Chin PANG, Hwee Hwa WANG, Hao WONG, Limsoon YU, Cui Large sequence databases, such as protein, DNA and gene sequences in biology, are becoming increasingly common. An important operation on a sequence database is approximate subsequence matching, where all subsequences that are within some distance from a given query string are retrieved. This paper proposes a filter-and-refine algorithm that enables efficient approximate subsequence matching in large DNA sequence databases. It employs a bitmap indexing structure to condense and encode each data sequence into a shorter index sequence. During query processing, the bitmap index is used to filter out most of the irrelevant subsequences, and false positives are removed in the final refinement step. Analytical and experimental studies show that the proposed strategy is capable of reducing response time substantially while incurring only a small space overhead. 2002-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/1144 info:doi/10.1109/IDEAS.2002.1029677 https://ink.library.smu.edu.sg/context/sis_research/article/2143/viewcontent/FastFilterandRefine_edited_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University DNA sequences approximate subsequence matching biology bitmap indexing structure data sequence condensing data sequence encoding false positive removal fast filter-and-refine algorithms gene sequences index sequence large DNA sequence databases large sequence databases protein sequences query processing query string response time small space overhead subsequence selection Databases and Information Systems Numerical Analysis and Scientific Computing |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
DNA sequences approximate subsequence matching biology bitmap indexing structure data sequence condensing data sequence encoding false positive removal fast filter-and-refine algorithms gene sequences index sequence large DNA sequence databases large sequence databases protein sequences query processing query string response time small space overhead subsequence selection Databases and Information Systems Numerical Analysis and Scientific Computing |
spellingShingle |
DNA sequences approximate subsequence matching biology bitmap indexing structure data sequence condensing data sequence encoding false positive removal fast filter-and-refine algorithms gene sequences index sequence large DNA sequence databases large sequence databases protein sequences query processing query string response time small space overhead subsequence selection Databases and Information Systems Numerical Analysis and Scientific Computing OOI, Beng-Chin PANG, Hwee Hwa WANG, Hao WONG, Limsoon YU, Cui Fast Filter-and-Refine Algorithms for Subsequence Selection |
description |
Large sequence databases, such as protein, DNA and gene sequences in biology, are becoming increasingly common. An important operation on a sequence database is approximate subsequence matching, where all subsequences that are within some distance from a given query string are retrieved. This paper proposes a filter-and-refine algorithm that enables efficient approximate subsequence matching in large DNA sequence databases. It employs a bitmap indexing structure to condense and encode each data sequence into a shorter index sequence. During query processing, the bitmap index is used to filter out most of the irrelevant subsequences, and false positives are removed in the final refinement step. Analytical and experimental studies show that the proposed strategy is capable of reducing response time substantially while incurring only a small space overhead. |
format |
text |
author |
OOI, Beng-Chin PANG, Hwee Hwa WANG, Hao WONG, Limsoon YU, Cui |
author_facet |
OOI, Beng-Chin PANG, Hwee Hwa WANG, Hao WONG, Limsoon YU, Cui |
author_sort |
OOI, Beng-Chin |
title |
Fast Filter-and-Refine Algorithms for Subsequence Selection |
title_short |
Fast Filter-and-Refine Algorithms for Subsequence Selection |
title_full |
Fast Filter-and-Refine Algorithms for Subsequence Selection |
title_fullStr |
Fast Filter-and-Refine Algorithms for Subsequence Selection |
title_full_unstemmed |
Fast Filter-and-Refine Algorithms for Subsequence Selection |
title_sort |
fast filter-and-refine algorithms for subsequence selection |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2002 |
url |
https://ink.library.smu.edu.sg/sis_research/1144 https://ink.library.smu.edu.sg/context/sis_research/article/2143/viewcontent/FastFilterandRefine_edited_.pdf |
_version_ |
1770570869651800064 |