Indexing Metric Uncertain Data for Range Queries

Range queries in metric spaces have applications in many areas such as multimedia retrieval, computational biology, and location-based services, where metric uncertain data exists in different forms, resulting from equipment limitations, high-throughput sequencing technologies, privacy preservation,...

Full description

Saved in:
Bibliographic Details
Main Authors: CHEN, Lu, GAO, Yunjun, LI, Xinhan, JENSEN, Christian S., CHEN, Gang, ZHENG, Baihua
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2015
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2892
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-3892
record_format dspace
spelling sg-smu-ink.sis_research-38922016-01-08T07:42:07Z Indexing Metric Uncertain Data for Range Queries CHEN, Lu GAO, Yunjun LI, Xinhan JENSEN, Christian S. CHEN, Gang ZHENG, Baihua Range queries in metric spaces have applications in many areas such as multimedia retrieval, computational biology, and location-based services, where metric uncertain data exists in different forms, resulting from equipment limitations, high-throughput sequencing technologies, privacy preservation, or others. In this paper, we represent metric uncertain data by using an object-level model and a bi-level model, respectively. Two novel indexes, the uncertain pivot B+-tree (UPB-tree) and the uncertain pivot B+-forest (UPB-forest), are proposed accordingly in order to support probabilistic range queries w.r.t. a wide range of uncertain data types and similarity metrics. Both index structures use a small set of effective pivots chosen based on a newly defined criterion, and employ the B+-tree(s) as the underlying index. By design, they are easy to be integrated into any existing DBMS. In addition, we present efficient metric probabilistic range query algorithms, which utilize the validation and pruning techniques based on our derived probability lower and upper bounds. Extensive experiments with both real and synthetic data sets demonstrate that, compared against existing state-of-the-art indexes for metric uncertain data, the UPB-tree and UPB-forest incur much lower construction costs, consume smaller storage spaces, and can support more efficient metric probabilistic range queries. 2015-06-04T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/2892 info:doi/10.1145/2723372.2723728 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University metric space index structure range query uncertain data Computer Sciences Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic metric space
index structure
range query
uncertain data
Computer Sciences
Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle metric space
index structure
range query
uncertain data
Computer Sciences
Databases and Information Systems
Numerical Analysis and Scientific Computing
CHEN, Lu
GAO, Yunjun
LI, Xinhan
JENSEN, Christian S.
CHEN, Gang
ZHENG, Baihua
Indexing Metric Uncertain Data for Range Queries
description Range queries in metric spaces have applications in many areas such as multimedia retrieval, computational biology, and location-based services, where metric uncertain data exists in different forms, resulting from equipment limitations, high-throughput sequencing technologies, privacy preservation, or others. In this paper, we represent metric uncertain data by using an object-level model and a bi-level model, respectively. Two novel indexes, the uncertain pivot B+-tree (UPB-tree) and the uncertain pivot B+-forest (UPB-forest), are proposed accordingly in order to support probabilistic range queries w.r.t. a wide range of uncertain data types and similarity metrics. Both index structures use a small set of effective pivots chosen based on a newly defined criterion, and employ the B+-tree(s) as the underlying index. By design, they are easy to be integrated into any existing DBMS. In addition, we present efficient metric probabilistic range query algorithms, which utilize the validation and pruning techniques based on our derived probability lower and upper bounds. Extensive experiments with both real and synthetic data sets demonstrate that, compared against existing state-of-the-art indexes for metric uncertain data, the UPB-tree and UPB-forest incur much lower construction costs, consume smaller storage spaces, and can support more efficient metric probabilistic range queries.
format text
author CHEN, Lu
GAO, Yunjun
LI, Xinhan
JENSEN, Christian S.
CHEN, Gang
ZHENG, Baihua
author_facet CHEN, Lu
GAO, Yunjun
LI, Xinhan
JENSEN, Christian S.
CHEN, Gang
ZHENG, Baihua
author_sort CHEN, Lu
title Indexing Metric Uncertain Data for Range Queries
title_short Indexing Metric Uncertain Data for Range Queries
title_full Indexing Metric Uncertain Data for Range Queries
title_fullStr Indexing Metric Uncertain Data for Range Queries
title_full_unstemmed Indexing Metric Uncertain Data for Range Queries
title_sort indexing metric uncertain data for range queries
publisher Institutional Knowledge at Singapore Management University
publishDate 2015
url https://ink.library.smu.edu.sg/sis_research/2892
_version_ 1770572665841516544