Autonomous learning machine on online big data analytics

The term “Big Data” refers to a proportion of dataset, which does not allow existing database management tools to retrieve, store, handle and analyze. Although the big data is often affiliated with the topic of volume, researchers in the field have found that it is inherent to other 4Vs: Variety,...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Yan Jun
Other Authors: Mahardhika Pratama
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2019
Subjects:
Online Access:http://hdl.handle.net/10356/76918
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The term “Big Data” refers to a proportion of dataset, which does not allow existing database management tools to retrieve, store, handle and analyze. Although the big data is often affiliated with the topic of volume, researchers in the field have found that it is inherent to other 4Vs: Variety, Velocity, Veracity, Velocity, etc. Different data analytic tools have been suggested. One commonly and widely used approach is the so-called MapReduce from Google. Nevertheless, most of existing works are offline in nature, because it expects full access of complete dataset and enable a machine learning algorithm to achieve multiple passes over all data. In this project, an online parallelization technique is developed, with integration of an Autonomous Learning Machine (ALMA). In addition, a data fusion technique is also developed, which will merge the product of ALMA from different parallelized data partitions. Both techniques are developed using R programming in the RStudio environment, and Apache Spark.