Accelerated big data analysis with deep generative models

Over the past decade, the growth of data has been phenomenal. The amount of data which the world accumulates is expected to increase from 4.4 zettabytes today to 44 zettabytes (44 trillion gigabytes) by the end of the year and with more people getting access to the internet as well as smart devices,...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Tan, Liang Wei
مؤلفون آخرون:	Gao CONG
التنسيق:	Final Year Project
اللغة:	English
منشور في:	Nanyang Technological University 2020
الموضوعات:	Engineering::Computer science and engineering::Information systems::Database management Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/140453
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	sg-ntu-dr.10356-140453
record_format	dspace
spelling	sg-ntu-dr.10356-1404532020-05-29T04:42:48Z Accelerated big data analysis with deep generative models Tan, Liang Wei Gao CONG School of Computer Science and Engineering gaocong@ntu.edu.sg Engineering::Computer science and engineering::Information systems::Database management Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Over the past decade, the growth of data has been phenomenal. The amount of data which the world accumulates is expected to increase from 4.4 zettabytes today to 44 zettabytes (44 trillion gigabytes) by the end of the year and with more people getting access to the internet as well as smart devices, this rate of growth will continue to skyrocket in the near future. This exponential rate of data growth is creating challenges in data analytics. Traditional computing tools for data analytics which process queries by running through the entire database such as Excel, SQL databases, or Hadoop, take too much time to evaluate statistical queries for large datasets. We need techniques which are much faster. In this project, we propose solutions that involve having a model learn about a target dataset and then, generate a small dataset which has similar statistical properties to the target dataset. We call this small representation dataset a mini dataset. Queries computed on the mini dataset give results which are almost identical to the results obtained by computing on the target dataset. However, because the mini dataset has a much smaller memory footprint, computation times are much shorter. It turns out that Deep Generative Models do exactly what we need. In this work, we use two state-of-the-art Deep Generative Models, Normalizing Flows (more focus on this) [1] and Variational Auto-encoders [2] to demonstrate this. We start by explaining how these models can be used to learn any given data distribution and generate mini datasets resembling the learnt data distribution. Then, we show and discuss the experimental results obtained by testing our techniques on real datasets. Finally, we compare the advantages and disadvantages of our proposed techniques with other state-of-the-art techniques in Approximate Query Processing (AQP). Bachelor of Engineering (Computer Science) 2020-05-29T04:42:48Z 2020-05-29T04:42:48Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/140453 en SCSE19-0315 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Information systems::Database management Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	Engineering::Computer science and engineering::Information systems::Database management Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Tan, Liang Wei Accelerated big data analysis with deep generative models
description	Over the past decade, the growth of data has been phenomenal. The amount of data which the world accumulates is expected to increase from 4.4 zettabytes today to 44 zettabytes (44 trillion gigabytes) by the end of the year and with more people getting access to the internet as well as smart devices, this rate of growth will continue to skyrocket in the near future. This exponential rate of data growth is creating challenges in data analytics. Traditional computing tools for data analytics which process queries by running through the entire database such as Excel, SQL databases, or Hadoop, take too much time to evaluate statistical queries for large datasets. We need techniques which are much faster. In this project, we propose solutions that involve having a model learn about a target dataset and then, generate a small dataset which has similar statistical properties to the target dataset. We call this small representation dataset a mini dataset. Queries computed on the mini dataset give results which are almost identical to the results obtained by computing on the target dataset. However, because the mini dataset has a much smaller memory footprint, computation times are much shorter. It turns out that Deep Generative Models do exactly what we need. In this work, we use two state-of-the-art Deep Generative Models, Normalizing Flows (more focus on this) [1] and Variational Auto-encoders [2] to demonstrate this. We start by explaining how these models can be used to learn any given data distribution and generate mini datasets resembling the learnt data distribution. Then, we show and discuss the experimental results obtained by testing our techniques on real datasets. Finally, we compare the advantages and disadvantages of our proposed techniques with other state-of-the-art techniques in Approximate Query Processing (AQP).
author2	Gao CONG
author_facet	Gao CONG Tan, Liang Wei
format	Final Year Project
author	Tan, Liang Wei
author_sort	Tan, Liang Wei
title	Accelerated big data analysis with deep generative models
title_short	Accelerated big data analysis with deep generative models
title_full	Accelerated big data analysis with deep generative models
title_fullStr	Accelerated big data analysis with deep generative models
title_full_unstemmed	Accelerated big data analysis with deep generative models
title_sort	accelerated big data analysis with deep generative models
publisher	Nanyang Technological University
publishDate	2020
url	https://hdl.handle.net/10356/140453
_version_	1681057640908587008

Accelerated big data analysis with deep generative models

مواد مشابهة