UpSizeR: Synthetically scaling an empirical relational database

The TPC benchmarks have helped users evaluate database system performance at different scales. Although each benchmark is domain-specific, it is not equally relevant to different applications in the same domain. The present proliferation of applications also leaves many of them uncovered by the very...

Full description

Saved in:
Bibliographic Details
Main Authors: TAY, Y. C., DAI, Bing Tian, WANG, Daniel T., SUN, Eldora Y., LIN, Yong, LIN, Yuting
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2013
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2048
https://ink.library.smu.edu.sg/context/sis_research/article/3047/viewcontent/UpSizeRInfoSys_PP.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-3047
record_format dspace
spelling sg-smu-ink.sis_research-30472020-04-02T06:17:50Z UpSizeR: Synthetically scaling an empirical relational database TAY, Y. C. DAI, Bing Tian WANG, Daniel T. SUN, Eldora Y. LIN, Yong LIN, Yuting The TPC benchmarks have helped users evaluate database system performance at different scales. Although each benchmark is domain-specific, it is not equally relevant to different applications in the same domain. The present proliferation of applications also leaves many of them uncovered by the very limited number of current TPC benchmarks. There is therefore a need to develop tools for application-specific database benchmarking. This paper presents UpSizeR, a software that addresses the Dataset Scaling Problem: Given an empirical set of relational tables D and a scale factor s, generate a database state e D that is similar to D but s times its size. Such a tool can be useful for scaling up D for scalability testing (s > 1), scaling down for application testing (s < 1), or anonymization (s = 1). Experiments with Flickr show that query results and response times on UpSizeR output match those on crawled data. They also accurately predict throughput degradation for a scale out test. The UpSizeR version in this paper focuses on extracting and replicating the correlation induced by the primary and foreign keys. There are many other forms of correlation involving nonkey values. It is a large task to develop UpSizeR into a tool that can extract and replicate all important correlation, so community effort is required. The current UpSizeR code has therefore been released for open-source development. The ultimate objective is to replace TPC with UpSizeR, so database owners can generate benchmarks that are relevant to their applications. 2013-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2048 info:doi/10.1016/j.is.2013.07.004 https://ink.library.smu.edu.sg/context/sis_research/article/3047/viewcontent/UpSizeRInfoSys_PP.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University application-specific benchmarking synthetic data generation scale factor empirical dataset attribute value correlation social networks Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic application-specific benchmarking
synthetic data generation
scale factor
empirical dataset
attribute value correlation
social networks
Databases and Information Systems
spellingShingle application-specific benchmarking
synthetic data generation
scale factor
empirical dataset
attribute value correlation
social networks
Databases and Information Systems
TAY, Y. C.
DAI, Bing Tian
WANG, Daniel T.
SUN, Eldora Y.
LIN, Yong
LIN, Yuting
UpSizeR: Synthetically scaling an empirical relational database
description The TPC benchmarks have helped users evaluate database system performance at different scales. Although each benchmark is domain-specific, it is not equally relevant to different applications in the same domain. The present proliferation of applications also leaves many of them uncovered by the very limited number of current TPC benchmarks. There is therefore a need to develop tools for application-specific database benchmarking. This paper presents UpSizeR, a software that addresses the Dataset Scaling Problem: Given an empirical set of relational tables D and a scale factor s, generate a database state e D that is similar to D but s times its size. Such a tool can be useful for scaling up D for scalability testing (s > 1), scaling down for application testing (s < 1), or anonymization (s = 1). Experiments with Flickr show that query results and response times on UpSizeR output match those on crawled data. They also accurately predict throughput degradation for a scale out test. The UpSizeR version in this paper focuses on extracting and replicating the correlation induced by the primary and foreign keys. There are many other forms of correlation involving nonkey values. It is a large task to develop UpSizeR into a tool that can extract and replicate all important correlation, so community effort is required. The current UpSizeR code has therefore been released for open-source development. The ultimate objective is to replace TPC with UpSizeR, so database owners can generate benchmarks that are relevant to their applications.
format text
author TAY, Y. C.
DAI, Bing Tian
WANG, Daniel T.
SUN, Eldora Y.
LIN, Yong
LIN, Yuting
author_facet TAY, Y. C.
DAI, Bing Tian
WANG, Daniel T.
SUN, Eldora Y.
LIN, Yong
LIN, Yuting
author_sort TAY, Y. C.
title UpSizeR: Synthetically scaling an empirical relational database
title_short UpSizeR: Synthetically scaling an empirical relational database
title_full UpSizeR: Synthetically scaling an empirical relational database
title_fullStr UpSizeR: Synthetically scaling an empirical relational database
title_full_unstemmed UpSizeR: Synthetically scaling an empirical relational database
title_sort upsizer: synthetically scaling an empirical relational database
publisher Institutional Knowledge at Singapore Management University
publishDate 2013
url https://ink.library.smu.edu.sg/sis_research/2048
https://ink.library.smu.edu.sg/context/sis_research/article/3047/viewcontent/UpSizeRInfoSys_PP.pdf
_version_ 1770571780556062720