Computing Immutable Regions for Subspace Top-k Queries

Given a high-dimensional dataset, a top-k query can be used to shortlist the k tuples that best match the user’s preferences. Typically, these preferences regard a subset of the available dimensions (i.e., attributes) whose relative significance is expressed by user-specified weights. Along with the...

Full description

Saved in:
Bibliographic Details
Main Authors: MOURATIDIS, Kyriakos, PANG, Hwee Hwa
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2013
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/1624
https://ink.library.smu.edu.sg/context/sis_research/article/2623/viewcontent/p73_mouratidis_pv.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-2623
record_format dspace
spelling sg-smu-ink.sis_research-26232020-03-25T03:42:50Z Computing Immutable Regions for Subspace Top-k Queries MOURATIDIS, Kyriakos PANG, Hwee Hwa Given a high-dimensional dataset, a top-k query can be used to shortlist the k tuples that best match the user’s preferences. Typically, these preferences regard a subset of the available dimensions (i.e., attributes) whose relative significance is expressed by user-specified weights. Along with the query result, we propose to compute for each involved dimension the maximal deviation to the corresponding weight for which the query result remains valid. The derived weight ranges, called immutable regions, are useful for performing sensitivity analysis, for finetuning the query weights, etc. In this paper, we focus on top-k queries with linear preference functions over the queried dimensions. We codify the conditions under which changes in a dimension’s weight invalidate the query result, and develop algorithms to compute the immutable regions. In general, this entails the examination of numerous non-result tuples. To reduce processing time, we introduce a pruning technique and a thresholding mechanism that allow the immutable regions to be determined correctly after examining only a small number of non-result tuples. We demonstrate empirically that the two techniques combine well to form a robust and highly resource-efficient algorithm. We verify the generality of our findings using real high- dimensional data from different domains (documents, images, etc) and with different characteristics. 2013-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/1624 info:doi/10.14778/2535568.2448941 https://ink.library.smu.edu.sg/context/sis_research/article/2623/viewcontent/p73_mouratidis_pv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Corresponding weights Different domains High dimensional data High-dimensional dataset Preference functions Pruning techniques Resource-efficient User's preferences Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Corresponding weights
Different domains
High dimensional data
High-dimensional dataset
Preference functions
Pruning techniques
Resource-efficient
User's preferences
Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Corresponding weights
Different domains
High dimensional data
High-dimensional dataset
Preference functions
Pruning techniques
Resource-efficient
User's preferences
Databases and Information Systems
Numerical Analysis and Scientific Computing
MOURATIDIS, Kyriakos
PANG, Hwee Hwa
Computing Immutable Regions for Subspace Top-k Queries
description Given a high-dimensional dataset, a top-k query can be used to shortlist the k tuples that best match the user’s preferences. Typically, these preferences regard a subset of the available dimensions (i.e., attributes) whose relative significance is expressed by user-specified weights. Along with the query result, we propose to compute for each involved dimension the maximal deviation to the corresponding weight for which the query result remains valid. The derived weight ranges, called immutable regions, are useful for performing sensitivity analysis, for finetuning the query weights, etc. In this paper, we focus on top-k queries with linear preference functions over the queried dimensions. We codify the conditions under which changes in a dimension’s weight invalidate the query result, and develop algorithms to compute the immutable regions. In general, this entails the examination of numerous non-result tuples. To reduce processing time, we introduce a pruning technique and a thresholding mechanism that allow the immutable regions to be determined correctly after examining only a small number of non-result tuples. We demonstrate empirically that the two techniques combine well to form a robust and highly resource-efficient algorithm. We verify the generality of our findings using real high- dimensional data from different domains (documents, images, etc) and with different characteristics.
format text
author MOURATIDIS, Kyriakos
PANG, Hwee Hwa
author_facet MOURATIDIS, Kyriakos
PANG, Hwee Hwa
author_sort MOURATIDIS, Kyriakos
title Computing Immutable Regions for Subspace Top-k Queries
title_short Computing Immutable Regions for Subspace Top-k Queries
title_full Computing Immutable Regions for Subspace Top-k Queries
title_fullStr Computing Immutable Regions for Subspace Top-k Queries
title_full_unstemmed Computing Immutable Regions for Subspace Top-k Queries
title_sort computing immutable regions for subspace top-k queries
publisher Institutional Knowledge at Singapore Management University
publishDate 2013
url https://ink.library.smu.edu.sg/sis_research/1624
https://ink.library.smu.edu.sg/context/sis_research/article/2623/viewcontent/p73_mouratidis_pv.pdf
_version_ 1770571354101252096