Autonomous learning machine on online big data analytics
The term “Big Data” refers to a proportion of dataset, which does not allow existing database management tools to retrieve, store, handle and analyze. Although the big data is often affiliated with the topic of volume, researchers in the field have found that it is inherent to other 4Vs: Variety,...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2019
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/76918 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The term “Big Data” refers to a proportion of dataset, which does not allow existing
database management tools to retrieve, store, handle and analyze.
Although the big data is often affiliated with the topic of volume, researchers in the field
have found that it is inherent to other 4Vs: Variety, Velocity, Veracity, Velocity, etc.
Different data analytic tools have been suggested. One commonly and widely used
approach is the so-called MapReduce from Google.
Nevertheless, most of existing works are offline in nature, because it expects full access of
complete dataset and enable a machine learning algorithm to achieve multiple passes over
all data.
In this project, an online parallelization technique is developed, with integration of an
Autonomous Learning Machine (ALMA). In addition, a data fusion technique is also
developed, which will merge the product of ALMA from different parallelized data
partitions. Both techniques are developed using R programming in the RStudio
environment, and Apache Spark. |
---|