New indexing methods for different types of spatial data

With the proliferation of GPS devices, spatial data is being generated at an unprecedented speed. Sometimes spatial data come with textual information and temporal information, forming spatio-textual data and trajectory data respectively. We are also seeing significant growths in the interest in the...

Full description

Saved in:
Bibliographic Details
Main Author: Gu, Tu
Other Authors: Gao Cong
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/173587
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-173587
record_format dspace
spelling sg-ntu-dr.10356-1735872024-03-07T08:52:06Z New indexing methods for different types of spatial data Gu, Tu Gao Cong Interdisciplinary Graduate School (IGS) gaocong@ntu.edu.sg Computer and Information Science With the proliferation of GPS devices, spatial data is being generated at an unprecedented speed. Sometimes spatial data come with textual information and temporal information, forming spatio-textual data and trajectory data respectively. We are also seeing significant growths in the interest in the management and analysis tasks (e.g., query processing) on different types of spatial data, which increasingly rely on good indexes. In this thesis, we explore various methods to build indexes for spatial data, spatio-textual data and trajectory data, in order to facilitate efficient query processing. First, we look into spatial data indexing. In recent years, learned indexes have been proposed to replace classic index structures like B-Tree with machine learning (ML) models. They require to replace both the indexes and the query processing algorithms currently deployed by the databases, and such a radical departure is likely to encounter challenges and obstacles. In contrast, we propose a fundamentally different way of using ML techniques to build a better R-Tree without the need to change the structure or query processing algorithms of traditional R-Tree. Specifically, we develop reinforcement learning (RL) based models to decide how to choose a subtree for insertion and how to split a node when building and updating an R-Tree, instead of relying on hand-crafted heuristic rules currently used by the R-Tree and its variants. Experiments on real and synthetic datasets with up to more than 100 million spatial objects show that our RL based index outperforms the R-Tree and its variants in terms of query processing time. Second, we look into spatio-textual data indexing. We first propose a novel concept of local keyword frequency, which measures keyword frequency of terms in some subspace of the data space, and then leverage on that to build an index, i.e., LKFIR-Tree, through a one-by-one insertion approach. Compared with existing methods, which do not consider different data features in different parts of the data, our proposed method demonstrates superior performance when processing spatial keyword queries. Experiments on three real datasets show that LKDIR-Tree outperforms the baselines by up to 61% when processing boolean range spatial keyword queries. Third, we look into trajectory data indexing. We propose BT-Tree, which is built through a recursive bi-partitioning approach, for the processing of range and KNN queries for past trajectory data. We first propose a cost function based method (CFBM) to build the BT-Tree. Specifically, we design a novel cost function, which incorporates the characteristics of both the data and historical query workload, to decide how to partition a BT-Tree node. Then we propose a reinforcement learning (RL) based method to address CFBM’s limitations, such as making locally optimal decisions that may lead to global suboptimality. Experiments on three real datasets with up to 800 million data points show that the CFBM generally outperforms the baselines in terms of query processing time and the RL based method consistently outperforms the baselines and has more significant advantages on larger datasets. Doctor of Philosophy 2024-02-16T01:59:23Z 2024-02-16T01:59:23Z 2023 Thesis-Doctor of Philosophy Gu, T. (2023). New indexing methods for different types of spatial data. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173587 https://hdl.handle.net/10356/173587 10.32657/10356/173587 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Gu, Tu
New indexing methods for different types of spatial data
description With the proliferation of GPS devices, spatial data is being generated at an unprecedented speed. Sometimes spatial data come with textual information and temporal information, forming spatio-textual data and trajectory data respectively. We are also seeing significant growths in the interest in the management and analysis tasks (e.g., query processing) on different types of spatial data, which increasingly rely on good indexes. In this thesis, we explore various methods to build indexes for spatial data, spatio-textual data and trajectory data, in order to facilitate efficient query processing. First, we look into spatial data indexing. In recent years, learned indexes have been proposed to replace classic index structures like B-Tree with machine learning (ML) models. They require to replace both the indexes and the query processing algorithms currently deployed by the databases, and such a radical departure is likely to encounter challenges and obstacles. In contrast, we propose a fundamentally different way of using ML techniques to build a better R-Tree without the need to change the structure or query processing algorithms of traditional R-Tree. Specifically, we develop reinforcement learning (RL) based models to decide how to choose a subtree for insertion and how to split a node when building and updating an R-Tree, instead of relying on hand-crafted heuristic rules currently used by the R-Tree and its variants. Experiments on real and synthetic datasets with up to more than 100 million spatial objects show that our RL based index outperforms the R-Tree and its variants in terms of query processing time. Second, we look into spatio-textual data indexing. We first propose a novel concept of local keyword frequency, which measures keyword frequency of terms in some subspace of the data space, and then leverage on that to build an index, i.e., LKFIR-Tree, through a one-by-one insertion approach. Compared with existing methods, which do not consider different data features in different parts of the data, our proposed method demonstrates superior performance when processing spatial keyword queries. Experiments on three real datasets show that LKDIR-Tree outperforms the baselines by up to 61% when processing boolean range spatial keyword queries. Third, we look into trajectory data indexing. We propose BT-Tree, which is built through a recursive bi-partitioning approach, for the processing of range and KNN queries for past trajectory data. We first propose a cost function based method (CFBM) to build the BT-Tree. Specifically, we design a novel cost function, which incorporates the characteristics of both the data and historical query workload, to decide how to partition a BT-Tree node. Then we propose a reinforcement learning (RL) based method to address CFBM’s limitations, such as making locally optimal decisions that may lead to global suboptimality. Experiments on three real datasets with up to 800 million data points show that the CFBM generally outperforms the baselines in terms of query processing time and the RL based method consistently outperforms the baselines and has more significant advantages on larger datasets.
author2 Gao Cong
author_facet Gao Cong
Gu, Tu
format Thesis-Doctor of Philosophy
author Gu, Tu
author_sort Gu, Tu
title New indexing methods for different types of spatial data
title_short New indexing methods for different types of spatial data
title_full New indexing methods for different types of spatial data
title_fullStr New indexing methods for different types of spatial data
title_full_unstemmed New indexing methods for different types of spatial data
title_sort new indexing methods for different types of spatial data
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/173587
_version_ 1794549373542072320