Medoid Queries in Large Spatial Databases
Assume that a franchise plans to open k branches in a city, so that the average distance from each residential block to the closest branch is minimized. This is an instance of the k-medoids problem, where residential blocks constitute the input dataset and the k branch locations correspond to the me...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2005
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/874 https://ink.library.smu.edu.sg/context/sis_research/article/1873/viewcontent/SSTD05_MED.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-1873 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-18732010-11-29T07:54:04Z Medoid Queries in Large Spatial Databases MOURATIDIS, Kyriakos Papadias, Dimitris Papadimitriou, Spiros Assume that a franchise plans to open k branches in a city, so that the average distance from each residential block to the closest branch is minimized. This is an instance of the k-medoids problem, where residential blocks constitute the input dataset and the k branch locations correspond to the medoids. Since the problem is NP-hard, research has focused on approximate solutions. Despite an avalanche of methods for small and moderate size datasets, currently there exists no technique applicable to very large databases. In this paper, we provide efficient algorithms that utilize an existing data-partition index to achieve low CPU and I/O cost. In particular, we exploit the intrinsic grouping properties of the index in order to avoid reading the entire dataset. Furthermore, we apply our framework to solve medoid-aggregate queries, where k is not known in advance; instead, we are asked to compute a medoid set that leads to an average distance close to a user-specified parameter T. Compared to previous approaches, we achieve results of comparable or better quality at a small fraction of the CPU and I/O costs (seconds as opposed to hours, and tens of node accesses instead of thousands). 2005-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/874 info:doi/10.1007/11535331_4 https://ink.library.smu.edu.sg/context/sis_research/article/1873/viewcontent/SSTD05_MED.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems Numerical Analysis and Scientific Computing |
spellingShingle |
Databases and Information Systems Numerical Analysis and Scientific Computing MOURATIDIS, Kyriakos Papadias, Dimitris Papadimitriou, Spiros Medoid Queries in Large Spatial Databases |
description |
Assume that a franchise plans to open k branches in a city, so that the average distance from each residential block to the closest branch is minimized. This is an instance of the k-medoids problem, where residential blocks constitute the input dataset and the k branch locations correspond to the medoids. Since the problem is NP-hard, research has focused on approximate solutions. Despite an avalanche of methods for small and moderate size datasets, currently there exists no technique applicable to very large databases. In this paper, we provide efficient algorithms that utilize an existing data-partition index to achieve low CPU and I/O cost. In particular, we exploit the intrinsic grouping properties of the index in order to avoid reading the entire dataset. Furthermore, we apply our framework to solve medoid-aggregate queries, where k is not known in advance; instead, we are asked to compute a medoid set that leads to an average distance close to a user-specified parameter T. Compared to previous approaches, we achieve results of comparable or better quality at a small fraction of the CPU and I/O costs (seconds as opposed to hours, and tens of node accesses instead of thousands). |
format |
text |
author |
MOURATIDIS, Kyriakos Papadias, Dimitris Papadimitriou, Spiros |
author_facet |
MOURATIDIS, Kyriakos Papadias, Dimitris Papadimitriou, Spiros |
author_sort |
MOURATIDIS, Kyriakos |
title |
Medoid Queries in Large Spatial Databases |
title_short |
Medoid Queries in Large Spatial Databases |
title_full |
Medoid Queries in Large Spatial Databases |
title_fullStr |
Medoid Queries in Large Spatial Databases |
title_full_unstemmed |
Medoid Queries in Large Spatial Databases |
title_sort |
medoid queries in large spatial databases |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2005 |
url |
https://ink.library.smu.edu.sg/sis_research/874 https://ink.library.smu.edu.sg/context/sis_research/article/1873/viewcontent/SSTD05_MED.pdf |
_version_ |
1770570746499694592 |