Multiple continuous top-K queries over data stream

Continuous top-kk query over sliding window is a fundamental challenge in the domain of streaming data management. Specifically, a continuous top-k query qq monitors the window WW, returning the kk objects with the highest scores to the system with each slide of the window. This paper delves into on...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHU, Rui, JIA, Yujin, YANG, Xiaochun, ZHENG, Baihua, WANG, Bin, ZONG, Chuanyu
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9284
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10284
record_format dspace
spelling sg-smu-ink.sis_research-102842024-09-09T03:00:04Z Multiple continuous top-K queries over data stream ZHU, Rui JIA, Yujin YANG, Xiaochun ZHENG, Baihua WANG, Bin ZONG, Chuanyu Continuous top-kk query over sliding window is a fundamental challenge in the domain of streaming data management. Specifically, a continuous top-k query qq monitors the window WW, returning the kk objects with the highest scores to the system with each slide of the window. This paper delves into one of its important variants, referred to as multiple continuous top. kk queries over data stream, which holds significant applications. While various efforts have been made to support continuous top-k query, few have addressed the complexities of multiple continuous top-k queries. The prevailing approach involves selecting a minimal number of objects in the window as candidates, incrementally maintaining them, and using them to support query processing as efficiently as possible. However, these endeavors exhibit sensitivity to the query workload scale or query parameters such as kk, the window length nn, and others. Consequently, they incur high running/space cost in updating the candidate set. In this paper, we propose a novel index PH-Tree (Partition and Heap-based Binary Tree), designed to facilitate multiple continuous top-k queries. We partition the query window into a group of disjoint partitions and use PH-Tree to organize these partitions. Additionally, the PH-Tree allows for flexible candidate selection based on the size of each partition, parameter distribution of queries and score distribution of objects. We further develop a group of efficient algorithms to support candidate set incremental maintenance and query processing. The effectiveness and efficiency of the proposed algorithms are validated through extensive theoretical analysis and exneriments detailed in this paper. 2024-05-16T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/9284 info:doi/10.1109/ICDE60146.2024.00129 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Sensitivity Costs Query processing Binary trees Data engineering Partitioning algorithms Maintenance Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Sensitivity
Costs
Query processing
Binary trees
Data engineering
Partitioning algorithms
Maintenance
Databases and Information Systems
spellingShingle Sensitivity
Costs
Query processing
Binary trees
Data engineering
Partitioning algorithms
Maintenance
Databases and Information Systems
ZHU, Rui
JIA, Yujin
YANG, Xiaochun
ZHENG, Baihua
WANG, Bin
ZONG, Chuanyu
Multiple continuous top-K queries over data stream
description Continuous top-kk query over sliding window is a fundamental challenge in the domain of streaming data management. Specifically, a continuous top-k query qq monitors the window WW, returning the kk objects with the highest scores to the system with each slide of the window. This paper delves into one of its important variants, referred to as multiple continuous top. kk queries over data stream, which holds significant applications. While various efforts have been made to support continuous top-k query, few have addressed the complexities of multiple continuous top-k queries. The prevailing approach involves selecting a minimal number of objects in the window as candidates, incrementally maintaining them, and using them to support query processing as efficiently as possible. However, these endeavors exhibit sensitivity to the query workload scale or query parameters such as kk, the window length nn, and others. Consequently, they incur high running/space cost in updating the candidate set. In this paper, we propose a novel index PH-Tree (Partition and Heap-based Binary Tree), designed to facilitate multiple continuous top-k queries. We partition the query window into a group of disjoint partitions and use PH-Tree to organize these partitions. Additionally, the PH-Tree allows for flexible candidate selection based on the size of each partition, parameter distribution of queries and score distribution of objects. We further develop a group of efficient algorithms to support candidate set incremental maintenance and query processing. The effectiveness and efficiency of the proposed algorithms are validated through extensive theoretical analysis and exneriments detailed in this paper.
format text
author ZHU, Rui
JIA, Yujin
YANG, Xiaochun
ZHENG, Baihua
WANG, Bin
ZONG, Chuanyu
author_facet ZHU, Rui
JIA, Yujin
YANG, Xiaochun
ZHENG, Baihua
WANG, Bin
ZONG, Chuanyu
author_sort ZHU, Rui
title Multiple continuous top-K queries over data stream
title_short Multiple continuous top-K queries over data stream
title_full Multiple continuous top-K queries over data stream
title_fullStr Multiple continuous top-K queries over data stream
title_full_unstemmed Multiple continuous top-K queries over data stream
title_sort multiple continuous top-k queries over data stream
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9284
_version_ 1814047872481492992