Efficient Processing of Exact Top-k Queries over Disk-Resident Sorted Lists

The top-k query is employed in a wide range of applications to generate a ranked list of data that have the highest aggregate scores over certain attributes. As the pool of attributes for selection by individual queries may be large, the data are indexed with per-attribute sorted lists, and a thresh...

Full description

Saved in:

Bibliographic Details
Main Authors:	PANG, Hwee Hwa, DING, Xuhua, ZHENG, Baihua
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2010
Subjects:	Databases and Information Systems Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/800 https://ink.library.smu.edu.sg/context/sis_research/article/1799/viewcontent/vldbj.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-1799
record_format	dspace
spelling	sg-smu-ink.sis_research-17992010-12-13T06:19:46Z Efficient Processing of Exact Top-k Queries over Disk-Resident Sorted Lists PANG, Hwee Hwa DING, Xuhua ZHENG, Baihua The top-k query is employed in a wide range of applications to generate a ranked list of data that have the highest aggregate scores over certain attributes. As the pool of attributes for selection by individual queries may be large, the data are indexed with per-attribute sorted lists, and a threshold algorithm (TA) is applied on the lists involved in each query. The TA executes in two phases--find a cut-off threshold for the top-k result scores, then evaluate all the records that could score above the threshold. In this paper, we focus on exact top-k queries that involve monotonic linear scoring functions over disk-resident sorted lists. We introduce a model for estimating the depths to which each sorted list needs to be processed in the two phases, so that (most of) the required records can be fetched efficiently through sequential or batched I/Os. We also devise a mechanism to quickly rank the data that qualify for the query answer and to eliminate those that do not, in order to reduce the computation demand of the query processor. Extensive experiments with four different datasets confirm that our schemes achieve substantial performance speed-up of between two times and two orders of magnitude over existing TAs, at the expense of a memory overhead of 4.8 bits per attribute value. Moreover, our scheme is robust to different data distributions and query characteristics. 2010-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/800 info:doi/10.1007/s00778-009-0174-x https://ink.library.smu.edu.sg/context/sis_research/article/1799/viewcontent/vldbj.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Databases and Information Systems Numerical Analysis and Scientific Computing
spellingShingle	Databases and Information Systems Numerical Analysis and Scientific Computing PANG, Hwee Hwa DING, Xuhua ZHENG, Baihua Efficient Processing of Exact Top-k Queries over Disk-Resident Sorted Lists
description	The top-k query is employed in a wide range of applications to generate a ranked list of data that have the highest aggregate scores over certain attributes. As the pool of attributes for selection by individual queries may be large, the data are indexed with per-attribute sorted lists, and a threshold algorithm (TA) is applied on the lists involved in each query. The TA executes in two phases--find a cut-off threshold for the top-k result scores, then evaluate all the records that could score above the threshold. In this paper, we focus on exact top-k queries that involve monotonic linear scoring functions over disk-resident sorted lists. We introduce a model for estimating the depths to which each sorted list needs to be processed in the two phases, so that (most of) the required records can be fetched efficiently through sequential or batched I/Os. We also devise a mechanism to quickly rank the data that qualify for the query answer and to eliminate those that do not, in order to reduce the computation demand of the query processor. Extensive experiments with four different datasets confirm that our schemes achieve substantial performance speed-up of between two times and two orders of magnitude over existing TAs, at the expense of a memory overhead of 4.8 bits per attribute value. Moreover, our scheme is robust to different data distributions and query characteristics.
format	text
author	PANG, Hwee Hwa DING, Xuhua ZHENG, Baihua
author_facet	PANG, Hwee Hwa DING, Xuhua ZHENG, Baihua
author_sort	PANG, Hwee Hwa
title	Efficient Processing of Exact Top-k Queries over Disk-Resident Sorted Lists
title_short	Efficient Processing of Exact Top-k Queries over Disk-Resident Sorted Lists
title_full	Efficient Processing of Exact Top-k Queries over Disk-Resident Sorted Lists
title_fullStr	Efficient Processing of Exact Top-k Queries over Disk-Resident Sorted Lists
title_full_unstemmed	Efficient Processing of Exact Top-k Queries over Disk-Resident Sorted Lists
title_sort	efficient processing of exact top-k queries over disk-resident sorted lists
publisher	Institutional Knowledge at Singapore Management University
publishDate	2010
url	https://ink.library.smu.edu.sg/sis_research/800 https://ink.library.smu.edu.sg/context/sis_research/article/1799/viewcontent/vldbj.pdf
_version_	1770570720528564224

Efficient Processing of Exact Top-k Queries over Disk-Resident Sorted Lists

Similar Items