New indexing methods for different types of spatial data

With the proliferation of GPS devices, spatial data is being generated at an unprecedented speed. Sometimes spatial data come with textual information and temporal information, forming spatio-textual data and trajectory data respectively. We are also seeing significant growths in the interest in the...

Full description

Saved in:

Bibliographic Details
Main Author:	Gu, Tu
Other Authors:	Gao Cong
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science
Online Access:	https://hdl.handle.net/10356/173587
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-173587
record_format	dspace
spelling	sg-ntu-dr.10356-1735872024-03-07T08:52:06Z New indexing methods for different types of spatial data Gu, Tu Gao Cong Interdisciplinary Graduate School (IGS) gaocong@ntu.edu.sg Computer and Information Science With the proliferation of GPS devices, spatial data is being generated at an unprecedented speed. Sometimes spatial data come with textual information and temporal information, forming spatio-textual data and trajectory data respectively. We are also seeing significant growths in the interest in the management and analysis tasks (e.g., query processing) on different types of spatial data, which increasingly rely on good indexes. In this thesis, we explore various methods to build indexes for spatial data, spatio-textual data and trajectory data, in order to facilitate efficient query processing. First, we look into spatial data indexing. In recent years, learned indexes have been proposed to replace classic index structures like B-Tree with machine learning (ML) models. They require to replace both the indexes and the query processing algorithms currently deployed by the databases, and such a radical departure is likely to encounter challenges and obstacles. In contrast, we propose a fundamentally different way of using ML techniques to build a better R-Tree without the need to change the structure or query processing algorithms of traditional R-Tree. Specifically, we develop reinforcement learning (RL) based models to decide how to choose a subtree for insertion and how to split a node when building and updating an R-Tree, instead of relying on hand-crafted heuristic rules currently used by the R-Tree and its variants. Experiments on real and synthetic datasets with up to more than 100 million spatial objects show that our RL based index outperforms the R-Tree and its variants in terms of query processing time. Second, we look into spatio-textual data indexing. We first propose a novel concept of local keyword frequency, which measures keyword frequency of terms in some subspace of the data space, and then leverage on that to build an index, i.e., LKFIR-Tree, through a one-by-one insertion approach. Compared with existing methods, which do not consider different data features in different parts of the data, our proposed method demonstrates superior performance when processing spatial keyword queries. Experiments on three real datasets show that LKDIR-Tree outperforms the baselines by up to 61% when processing boolean range spatial keyword queries. Third, we look into trajectory data indexing. We propose BT-Tree, which is built through a recursive bi-partitioning approach, for the processing of range and KNN queries for past trajectory data. We first propose a cost function based method (CFBM) to build the BT-Tree. Specifically, we design a novel cost function, which incorporates the characteristics of both the data and historical query workload, to decide how to partition a BT-Tree node. Then we propose a reinforcement learning (RL) based method to address CFBM’s limitations, such as making locally optimal decisions that may lead to global suboptimality. Experiments on three real datasets with up to 800 million data points show that the CFBM generally outperforms the baselines in terms of query processing time and the RL based method consistently outperforms the baselines and has more significant advantages on larger datasets. Doctor of Philosophy 2024-02-16T01:59:23Z 2024-02-16T01:59:23Z 2023 Thesis-Doctor of Philosophy Gu, T. (2023). New indexing methods for different types of spatial data. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173587 https://hdl.handle.net/10356/173587 10.32657/10356/173587 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science
spellingShingle	Computer and Information Science Gu, Tu New indexing methods for different types of spatial data
description	With the proliferation of GPS devices, spatial data is being generated at an unprecedented speed. Sometimes spatial data come with textual information and temporal information, forming spatio-textual data and trajectory data respectively. We are also seeing significant growths in the interest in the management and analysis tasks (e.g., query processing) on different types of spatial data, which increasingly rely on good indexes. In this thesis, we explore various methods to build indexes for spatial data, spatio-textual data and trajectory data, in order to facilitate efficient query processing. First, we look into spatial data indexing. In recent years, learned indexes have been proposed to replace classic index structures like B-Tree with machine learning (ML) models. They require to replace both the indexes and the query processing algorithms currently deployed by the databases, and such a radical departure is likely to encounter challenges and obstacles. In contrast, we propose a fundamentally different way of using ML techniques to build a better R-Tree without the need to change the structure or query processing algorithms of traditional R-Tree. Specifically, we develop reinforcement learning (RL) based models to decide how to choose a subtree for insertion and how to split a node when building and updating an R-Tree, instead of relying on hand-crafted heuristic rules currently used by the R-Tree and its variants. Experiments on real and synthetic datasets with up to more than 100 million spatial objects show that our RL based index outperforms the R-Tree and its variants in terms of query processing time. Second, we look into spatio-textual data indexing. We first propose a novel concept of local keyword frequency, which measures keyword frequency of terms in some subspace of the data space, and then leverage on that to build an index, i.e., LKFIR-Tree, through a one-by-one insertion approach. Compared with existing methods, which do not consider different data features in different parts of the data, our proposed method demonstrates superior performance when processing spatial keyword queries. Experiments on three real datasets show that LKDIR-Tree outperforms the baselines by up to 61% when processing boolean range spatial keyword queries. Third, we look into trajectory data indexing. We propose BT-Tree, which is built through a recursive bi-partitioning approach, for the processing of range and KNN queries for past trajectory data. We first propose a cost function based method (CFBM) to build the BT-Tree. Specifically, we design a novel cost function, which incorporates the characteristics of both the data and historical query workload, to decide how to partition a BT-Tree node. Then we propose a reinforcement learning (RL) based method to address CFBM’s limitations, such as making locally optimal decisions that may lead to global suboptimality. Experiments on three real datasets with up to 800 million data points show that the CFBM generally outperforms the baselines in terms of query processing time and the RL based method consistently outperforms the baselines and has more significant advantages on larger datasets.
author2	Gao Cong
author_facet	Gao Cong Gu, Tu
format	Thesis-Doctor of Philosophy
author	Gu, Tu
author_sort	Gu, Tu
title	New indexing methods for different types of spatial data
title_short	New indexing methods for different types of spatial data
title_full	New indexing methods for different types of spatial data
title_fullStr	New indexing methods for different types of spatial data
title_full_unstemmed	New indexing methods for different types of spatial data
title_sort	new indexing methods for different types of spatial data
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/173587
_version_	1794549373542072320

New indexing methods for different types of spatial data

Similar Items