Minimum coresets for maxima representation of multidimensional data

Coresets are succinct summaries of large datasets such that, for a given problem, the solution obtained from a coreset is provably competitive with the solution obtained from the full dataset. As such, coreset-based data summarization techniques have been successfully applied to various problems, e....

Full description

Saved in:
Bibliographic Details
Main Authors: WANG, Yanhao, MATHIOUDAKIS, Michael, LI, Yuchen, TAN, Kian-Lee
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6202
https://ink.library.smu.edu.sg/context/sis_research/article/7205/viewcontent/3452021.3458322.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7205
record_format dspace
spelling sg-smu-ink.sis_research-72052021-10-14T06:59:59Z Minimum coresets for maxima representation of multidimensional data WANG, Yanhao MATHIOUDAKIS, Michael LI, Yuchen TAN, Kian-Lee Coresets are succinct summaries of large datasets such that, for a given problem, the solution obtained from a coreset is provably competitive with the solution obtained from the full dataset. As such, coreset-based data summarization techniques have been successfully applied to various problems, e.g., geometric optimization, clustering, and approximate query processing, for scaling them up to massive data. In this paper, we study coresets for the maxima representation of multidimensional data: Given a set �� of points in R �� , where �� is a small constant, and an error parameter �� ∈ (0, 1), a subset �� ⊆ �� is an ��-coreset for the maxima representation of �� iff the maximum of �� is an ��-approximation of the maximum of �� for any vector �� ∈ R �� , where the maximum is taken over the inner products between the set of points (�� or ��) and ��. We define a novel minimum ��-coreset problem that asks for an ��-coreset of the smallest size for the maxima representation of a point set. For the two-dimensional case, we develop an optimal polynomial-time algorithm for the minimum ��-coreset problem by transforming it into the shortest-cycle problem in a directed graph. Then, we prove that this problem is NP-hard in three or higher dimensions and present polynomial-time approximation algorithms in an arbitrary fixed dimension. Finally, we provide extensive experimental results on both real and synthetic datasets to demonstrate the superior performance of our proposed algorithms. 2021-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6202 info:doi/10.1145/3452021.3458322 https://ink.library.smu.edu.sg/context/sis_research/article/7205/viewcontent/3452021.3458322.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Coreset maxima representation ��-kernel convex hull regret minimizing set Databases and Information Systems Data Storage Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Coreset
maxima representation
��-kernel
convex hull
regret minimizing set
Databases and Information Systems
Data Storage Systems
spellingShingle Coreset
maxima representation
��-kernel
convex hull
regret minimizing set
Databases and Information Systems
Data Storage Systems
WANG, Yanhao
MATHIOUDAKIS, Michael
LI, Yuchen
TAN, Kian-Lee
Minimum coresets for maxima representation of multidimensional data
description Coresets are succinct summaries of large datasets such that, for a given problem, the solution obtained from a coreset is provably competitive with the solution obtained from the full dataset. As such, coreset-based data summarization techniques have been successfully applied to various problems, e.g., geometric optimization, clustering, and approximate query processing, for scaling them up to massive data. In this paper, we study coresets for the maxima representation of multidimensional data: Given a set �� of points in R �� , where �� is a small constant, and an error parameter �� ∈ (0, 1), a subset �� ⊆ �� is an ��-coreset for the maxima representation of �� iff the maximum of �� is an ��-approximation of the maximum of �� for any vector �� ∈ R �� , where the maximum is taken over the inner products between the set of points (�� or ��) and ��. We define a novel minimum ��-coreset problem that asks for an ��-coreset of the smallest size for the maxima representation of a point set. For the two-dimensional case, we develop an optimal polynomial-time algorithm for the minimum ��-coreset problem by transforming it into the shortest-cycle problem in a directed graph. Then, we prove that this problem is NP-hard in three or higher dimensions and present polynomial-time approximation algorithms in an arbitrary fixed dimension. Finally, we provide extensive experimental results on both real and synthetic datasets to demonstrate the superior performance of our proposed algorithms.
format text
author WANG, Yanhao
MATHIOUDAKIS, Michael
LI, Yuchen
TAN, Kian-Lee
author_facet WANG, Yanhao
MATHIOUDAKIS, Michael
LI, Yuchen
TAN, Kian-Lee
author_sort WANG, Yanhao
title Minimum coresets for maxima representation of multidimensional data
title_short Minimum coresets for maxima representation of multidimensional data
title_full Minimum coresets for maxima representation of multidimensional data
title_fullStr Minimum coresets for maxima representation of multidimensional data
title_full_unstemmed Minimum coresets for maxima representation of multidimensional data
title_sort minimum coresets for maxima representation of multidimensional data
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/6202
https://ink.library.smu.edu.sg/context/sis_research/article/7205/viewcontent/3452021.3458322.pdf
_version_ 1770575890411945984