Autonomous learning machine on online big data analytics

The term “Big Data” refers to a proportion of dataset, which does not allow existing database management tools to retrieve, store, handle and analyze. Although the big data is often affiliated with the topic of volume, researchers in the field have found that it is inherent to other 4Vs: Variety,...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Lim, Yan Jun
مؤلفون آخرون: Mahardhika Pratama
التنسيق: Final Year Project
اللغة:English
منشور في: Nanyang Technological University 2019
الموضوعات:
الوصول للمادة أونلاين:http://hdl.handle.net/10356/76918
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:The term “Big Data” refers to a proportion of dataset, which does not allow existing database management tools to retrieve, store, handle and analyze. Although the big data is often affiliated with the topic of volume, researchers in the field have found that it is inherent to other 4Vs: Variety, Velocity, Veracity, Velocity, etc. Different data analytic tools have been suggested. One commonly and widely used approach is the so-called MapReduce from Google. Nevertheless, most of existing works are offline in nature, because it expects full access of complete dataset and enable a machine learning algorithm to achieve multiple passes over all data. In this project, an online parallelization technique is developed, with integration of an Autonomous Learning Machine (ALMA). In addition, a data fusion technique is also developed, which will merge the product of ALMA from different parallelized data partitions. Both techniques are developed using R programming in the RStudio environment, and Apache Spark.