Accelerated big data analysis with deep generative models
Over the past decade, the growth of data has been phenomenal. The amount of data which the world accumulates is expected to increase from 4.4 zettabytes today to 44 zettabytes (44 trillion gigabytes) by the end of the year and with more people getting access to the internet as well as smart devices,...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/140453 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-140453 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1404532020-05-29T04:42:48Z Accelerated big data analysis with deep generative models Tan, Liang Wei Gao CONG School of Computer Science and Engineering gaocong@ntu.edu.sg Engineering::Computer science and engineering::Information systems::Database management Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Over the past decade, the growth of data has been phenomenal. The amount of data which the world accumulates is expected to increase from 4.4 zettabytes today to 44 zettabytes (44 trillion gigabytes) by the end of the year and with more people getting access to the internet as well as smart devices, this rate of growth will continue to skyrocket in the near future. This exponential rate of data growth is creating challenges in data analytics. Traditional computing tools for data analytics which process queries by running through the entire database such as Excel, SQL databases, or Hadoop, take too much time to evaluate statistical queries for large datasets. We need techniques which are much faster. In this project, we propose solutions that involve having a model learn about a target dataset and then, generate a small dataset which has similar statistical properties to the target dataset. We call this small representation dataset a mini dataset. Queries computed on the mini dataset give results which are almost identical to the results obtained by computing on the target dataset. However, because the mini dataset has a much smaller memory footprint, computation times are much shorter. It turns out that Deep Generative Models do exactly what we need. In this work, we use two state-of-the-art Deep Generative Models, Normalizing Flows (more focus on this) [1] and Variational Auto-encoders [2] to demonstrate this. We start by explaining how these models can be used to learn any given data distribution and generate mini datasets resembling the learnt data distribution. Then, we show and discuss the experimental results obtained by testing our techniques on real datasets. Finally, we compare the advantages and disadvantages of our proposed techniques with other state-of-the-art techniques in Approximate Query Processing (AQP). Bachelor of Engineering (Computer Science) 2020-05-29T04:42:48Z 2020-05-29T04:42:48Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/140453 en SCSE19-0315 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Information systems::Database management Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence |
spellingShingle |
Engineering::Computer science and engineering::Information systems::Database management Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Tan, Liang Wei Accelerated big data analysis with deep generative models |
description |
Over the past decade, the growth of data has been phenomenal. The amount of data which the world accumulates is expected to increase from 4.4 zettabytes today to 44 zettabytes (44 trillion gigabytes) by the end of the year and with more people getting access to the internet as well as smart devices, this rate of growth will continue to skyrocket in the near future. This exponential rate of data growth is creating challenges in data analytics. Traditional computing tools for data analytics which process queries by running through the entire database such as Excel, SQL databases, or Hadoop, take too much time to evaluate statistical queries for large datasets. We need techniques which are much faster. In this project, we propose solutions that involve having a model learn about a target dataset and then, generate a small dataset which has similar statistical properties to the target dataset. We call this small representation dataset a mini dataset. Queries computed on the mini dataset give results which are almost identical to the results obtained by computing on the target dataset. However, because the mini dataset has a much smaller memory footprint, computation times are much shorter. It turns out that Deep Generative Models do exactly what we need. In this work, we use two state-of-the-art Deep Generative Models, Normalizing Flows (more focus on this) [1] and Variational Auto-encoders [2] to demonstrate this. We start by explaining how these models can be used to learn any given data distribution and generate mini datasets resembling the learnt data distribution. Then, we show and discuss the experimental results obtained by testing our techniques on real datasets. Finally, we compare the advantages and disadvantages of our proposed techniques with other state-of-the-art techniques in Approximate Query Processing (AQP). |
author2 |
Gao CONG |
author_facet |
Gao CONG Tan, Liang Wei |
format |
Final Year Project |
author |
Tan, Liang Wei |
author_sort |
Tan, Liang Wei |
title |
Accelerated big data analysis with deep generative models |
title_short |
Accelerated big data analysis with deep generative models |
title_full |
Accelerated big data analysis with deep generative models |
title_fullStr |
Accelerated big data analysis with deep generative models |
title_full_unstemmed |
Accelerated big data analysis with deep generative models |
title_sort |
accelerated big data analysis with deep generative models |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/140453 |
_version_ |
1681057640908587008 |