Efficent sampling procedure for small storage devices

Sampling is concerned with the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population. For large, multi-dimensional databases, algorithms for data analytics might require multiple iterations over the whole database which can be v...

Full description

Saved in:
Bibliographic Details
Main Author: Agrawal, Rohit
Other Authors: Ong Keng Sian, Vincent
Format: Final Year Project
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/54266
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-54266
record_format dspace
spelling sg-ntu-dr.10356-542662023-07-07T16:31:21Z Efficent sampling procedure for small storage devices Agrawal, Rohit Ong Keng Sian, Vincent School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Sampling is concerned with the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population. For large, multi-dimensional databases, algorithms for data analytics might require multiple iterations over the whole database which can be very expensive in terms of time. However, in many applications, approximate (rather than exact) answers to queries are often more than satisfactory. For such applications, by drilling down to a sample of members, one can quickly analyze a large multidimensional database with a focus on data trends or approximate information in the initial stage. In this project, a distance based sampling algorithm DSSC (Distance based Sampling for Streaming data with Continuous attributes) is proposed. DSSC can be used in applications which require a high quality sample but are limited in terms of memory and processing power, such as mobile devices. Preliminary results on data sets show that DSSC is robust to noise and requires little memory space. We prove that the cost of an incoming transaction is at most O(n.|T|). Bachelor of Engineering 2013-06-18T04:01:27Z 2013-06-18T04:01:27Z 2013 2013 Final Year Project (FYP) http://hdl.handle.net/10356/54266 en Nanyang Technological University 58 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Agrawal, Rohit
Efficent sampling procedure for small storage devices
description Sampling is concerned with the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population. For large, multi-dimensional databases, algorithms for data analytics might require multiple iterations over the whole database which can be very expensive in terms of time. However, in many applications, approximate (rather than exact) answers to queries are often more than satisfactory. For such applications, by drilling down to a sample of members, one can quickly analyze a large multidimensional database with a focus on data trends or approximate information in the initial stage. In this project, a distance based sampling algorithm DSSC (Distance based Sampling for Streaming data with Continuous attributes) is proposed. DSSC can be used in applications which require a high quality sample but are limited in terms of memory and processing power, such as mobile devices. Preliminary results on data sets show that DSSC is robust to noise and requires little memory space. We prove that the cost of an incoming transaction is at most O(n.|T|).
author2 Ong Keng Sian, Vincent
author_facet Ong Keng Sian, Vincent
Agrawal, Rohit
format Final Year Project
author Agrawal, Rohit
author_sort Agrawal, Rohit
title Efficent sampling procedure for small storage devices
title_short Efficent sampling procedure for small storage devices
title_full Efficent sampling procedure for small storage devices
title_fullStr Efficent sampling procedure for small storage devices
title_full_unstemmed Efficent sampling procedure for small storage devices
title_sort efficent sampling procedure for small storage devices
publishDate 2013
url http://hdl.handle.net/10356/54266
_version_ 1772825794469429248