Indexing metric uncertain data for range queries and range joins

Range queries and range joins in metric spaces have applications in many areas, including GIS, computational biology, and data integration, where metric uncertain data exist in different forms, resulting from circumstances such as equipment limitations, high-throughput sequencing technologies, and p...

Full description

Saved in:
Bibliographic Details
Main Authors: CHEN, Lu, GAO, Yunjun, ZHONG, Aoxiao, JENSEN, Christian S., CHEN, Gang, ZHENG, Baihua
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3707
https://ink.library.smu.edu.sg/context/sis_research/article/4709/viewcontent/101007_s00778_017_0465_6.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4709
record_format dspace
spelling sg-smu-ink.sis_research-47092019-02-04T03:41:25Z Indexing metric uncertain data for range queries and range joins CHEN, Lu GAO, Yunjun ZHONG, Aoxiao JENSEN, Christian S. CHEN, Gang ZHENG, Baihua Range queries and range joins in metric spaces have applications in many areas, including GIS, computational biology, and data integration, where metric uncertain data exist in different forms, resulting from circumstances such as equipment limitations, high-throughput sequencing technologies, and privacy preservation. We represent metric uncertain data by using an object-level model and a bi-level model, respectively. Two novel indexes, the uncertain pivot B+-tree (UPB-tree) and the uncertain pivot B+-forest (UPB-forest), are proposed in order to support probabilistic range queries and range joins for a wide range of uncertain data types and similarity metrics. Both index structures use a small set of effective pivots chosen based on a newly defined criterion and employ the B+-tree(s) as the underlying index. In addition, we present efficient metric probabilistic range query and metric probabilistic range join algorithms, which utilize validation and pruning techniques based on derived probability lower and upper bounds. Extensive experiments with both real and synthetic data sets demonstrate that, compared against existing state-of-the-art indexes for metric uncertain data, the UPB-tree and the UPB-forest incur much lower construction costs, consume less storage space, and can support more efficient metric probabilistic range queries and metric probabilistic range joins. 2017-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3707 info:doi/10.1007/s00778-017-0465-6 https://ink.library.smu.edu.sg/context/sis_research/article/4709/viewcontent/101007_s00778_017_0465_6.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Range query Range join Uncertain data Metric space Index structure Databases and Information Systems Data Storage Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Range query
Range join
Uncertain data
Metric space
Index structure
Databases and Information Systems
Data Storage Systems
spellingShingle Range query
Range join
Uncertain data
Metric space
Index structure
Databases and Information Systems
Data Storage Systems
CHEN, Lu
GAO, Yunjun
ZHONG, Aoxiao
JENSEN, Christian S.
CHEN, Gang
ZHENG, Baihua
Indexing metric uncertain data for range queries and range joins
description Range queries and range joins in metric spaces have applications in many areas, including GIS, computational biology, and data integration, where metric uncertain data exist in different forms, resulting from circumstances such as equipment limitations, high-throughput sequencing technologies, and privacy preservation. We represent metric uncertain data by using an object-level model and a bi-level model, respectively. Two novel indexes, the uncertain pivot B+-tree (UPB-tree) and the uncertain pivot B+-forest (UPB-forest), are proposed in order to support probabilistic range queries and range joins for a wide range of uncertain data types and similarity metrics. Both index structures use a small set of effective pivots chosen based on a newly defined criterion and employ the B+-tree(s) as the underlying index. In addition, we present efficient metric probabilistic range query and metric probabilistic range join algorithms, which utilize validation and pruning techniques based on derived probability lower and upper bounds. Extensive experiments with both real and synthetic data sets demonstrate that, compared against existing state-of-the-art indexes for metric uncertain data, the UPB-tree and the UPB-forest incur much lower construction costs, consume less storage space, and can support more efficient metric probabilistic range queries and metric probabilistic range joins.
format text
author CHEN, Lu
GAO, Yunjun
ZHONG, Aoxiao
JENSEN, Christian S.
CHEN, Gang
ZHENG, Baihua
author_facet CHEN, Lu
GAO, Yunjun
ZHONG, Aoxiao
JENSEN, Christian S.
CHEN, Gang
ZHENG, Baihua
author_sort CHEN, Lu
title Indexing metric uncertain data for range queries and range joins
title_short Indexing metric uncertain data for range queries and range joins
title_full Indexing metric uncertain data for range queries and range joins
title_fullStr Indexing metric uncertain data for range queries and range joins
title_full_unstemmed Indexing metric uncertain data for range queries and range joins
title_sort indexing metric uncertain data for range queries and range joins
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/3707
https://ink.library.smu.edu.sg/context/sis_research/article/4709/viewcontent/101007_s00778_017_0465_6.pdf
_version_ 1770573677279051776