Accelerated big data analysis with deep generative models

Over the past decade, the growth of data has been phenomenal. The amount of data which the world accumulates is expected to increase from 4.4 zettabytes today to 44 zettabytes (44 trillion gigabytes) by the end of the year and with more people getting access to the internet as well as smart devices,...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Liang Wei
Other Authors: Gao CONG
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/140453
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-140453
record_format dspace
spelling sg-ntu-dr.10356-1404532020-05-29T04:42:48Z Accelerated big data analysis with deep generative models Tan, Liang Wei Gao CONG School of Computer Science and Engineering gaocong@ntu.edu.sg Engineering::Computer science and engineering::Information systems::Database management Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Over the past decade, the growth of data has been phenomenal. The amount of data which the world accumulates is expected to increase from 4.4 zettabytes today to 44 zettabytes (44 trillion gigabytes) by the end of the year and with more people getting access to the internet as well as smart devices, this rate of growth will continue to skyrocket in the near future. This exponential rate of data growth is creating challenges in data analytics. Traditional computing tools for data analytics which process queries by running through the entire database such as Excel, SQL databases, or Hadoop, take too much time to evaluate statistical queries for large datasets. We need techniques which are much faster. In this project, we propose solutions that involve having a model learn about a target dataset and then, generate a small dataset which has similar statistical properties to the target dataset. We call this small representation dataset a mini dataset. Queries computed on the mini dataset give results which are almost identical to the results obtained by computing on the target dataset. However, because the mini dataset has a much smaller memory footprint, computation times are much shorter. It turns out that Deep Generative Models do exactly what we need. In this work, we use two state-of-the-art Deep Generative Models, Normalizing Flows (more focus on this) [1] and Variational Auto-encoders [2] to demonstrate this. We start by explaining how these models can be used to learn any given data distribution and generate mini datasets resembling the learnt data distribution. Then, we show and discuss the experimental results obtained by testing our techniques on real datasets. Finally, we compare the advantages and disadvantages of our proposed techniques with other state-of-the-art techniques in Approximate Query Processing (AQP). Bachelor of Engineering (Computer Science) 2020-05-29T04:42:48Z 2020-05-29T04:42:48Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/140453 en SCSE19-0315 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Information systems::Database management
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle Engineering::Computer science and engineering::Information systems::Database management
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Tan, Liang Wei
Accelerated big data analysis with deep generative models
description Over the past decade, the growth of data has been phenomenal. The amount of data which the world accumulates is expected to increase from 4.4 zettabytes today to 44 zettabytes (44 trillion gigabytes) by the end of the year and with more people getting access to the internet as well as smart devices, this rate of growth will continue to skyrocket in the near future. This exponential rate of data growth is creating challenges in data analytics. Traditional computing tools for data analytics which process queries by running through the entire database such as Excel, SQL databases, or Hadoop, take too much time to evaluate statistical queries for large datasets. We need techniques which are much faster. In this project, we propose solutions that involve having a model learn about a target dataset and then, generate a small dataset which has similar statistical properties to the target dataset. We call this small representation dataset a mini dataset. Queries computed on the mini dataset give results which are almost identical to the results obtained by computing on the target dataset. However, because the mini dataset has a much smaller memory footprint, computation times are much shorter. It turns out that Deep Generative Models do exactly what we need. In this work, we use two state-of-the-art Deep Generative Models, Normalizing Flows (more focus on this) [1] and Variational Auto-encoders [2] to demonstrate this. We start by explaining how these models can be used to learn any given data distribution and generate mini datasets resembling the learnt data distribution. Then, we show and discuss the experimental results obtained by testing our techniques on real datasets. Finally, we compare the advantages and disadvantages of our proposed techniques with other state-of-the-art techniques in Approximate Query Processing (AQP).
author2 Gao CONG
author_facet Gao CONG
Tan, Liang Wei
format Final Year Project
author Tan, Liang Wei
author_sort Tan, Liang Wei
title Accelerated big data analysis with deep generative models
title_short Accelerated big data analysis with deep generative models
title_full Accelerated big data analysis with deep generative models
title_fullStr Accelerated big data analysis with deep generative models
title_full_unstemmed Accelerated big data analysis with deep generative models
title_sort accelerated big data analysis with deep generative models
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/140453
_version_ 1681057640908587008