Evolving large-scale data stream analytics based on scalable PANFIS

The main challenge in large-scale data stream analytics lies in the ability of machine learning to generate large-scale data knowledge in reasonable timeframe without suffering from a loss of accuracy. Many distributed machine learning frameworks have recently been built to speed up the large-scale...

Full description

Saved in:

Bibliographic Details
Main Authors:	Za'in, Choiru, Pratama, Mahardhika, Pardede, Eric
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2021
Subjects:	Engineering::Computer science and engineering Large-scale Data Stream Analytics Distributed Data Stream Mining
Online Access:	https://hdl.handle.net/10356/151672
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-151672
record_format	dspace
spelling	sg-ntu-dr.10356-1516722021-07-14T07:16:22Z Evolving large-scale data stream analytics based on scalable PANFIS Za'in, Choiru Pratama, Mahardhika Pardede, Eric School of Computer Science and Engineering Engineering::Computer science and engineering Large-scale Data Stream Analytics Distributed Data Stream Mining The main challenge in large-scale data stream analytics lies in the ability of machine learning to generate large-scale data knowledge in reasonable timeframe without suffering from a loss of accuracy. Many distributed machine learning frameworks have recently been built to speed up the large-scale data learning process. However, most distributed machine learning used in these frameworks still uses an offline algorithm model which cannot cope with the data stream problems. In fact, large-scale data are mostly generated by the non-stationary data stream where its pattern evolves over time. To address this problem, we propose a novel Evolving Large-scale Data Stream Analytics framework based on a Scalable Parsimonious Network based on Fuzzy Inference System (Scalable PANFIS), where the PANFIS evolving algorithm is distributed over the worker nodes in the cloud to learn large-scale data stream. Scalable PANFIS framework incorporates the active learning (AL) strategy and two model fusion methods. The AL accelerates the distributed learning process to generate an initial evolving large-scale data stream model (initial model), whereas the two model fusion methods aggregate an initial model to generate the final model. The final model represents the update of current large-scale data knowledge which can be used to infer future data. Extensive experiments on this framework are validated by measuring the accuracy and running time of four combinations of Scalable PANFIS and other Spark-based built in algorithms. The results indicate that Scalable PANFIS with AL improves the training time to be almost two times faster than Scalable PANFIS without AL. The results also show both rule merging and the voting mechanisms yield similar accuracy in general among Scalable PANFIS algorithms and they are generally better than Spark-based algorithms. In terms of running time, the Scalable PANFIS training time outperforms all Spark-based algorithms when classifying a multi-class label dataset. Ministry of Education (MOE) Nanyang Technological University This project is fully supported by NTU, Singapore start up grant and MOE tier 1 research grant. This research is also supported by use of the Nectar Research Cloud, a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy (NCRIS). 2021-07-14T07:16:22Z 2021-07-14T07:16:22Z 2019 Journal Article Za'in, C., Pratama, M. & Pardede, E. (2019). Evolving large-scale data stream analytics based on scalable PANFIS. Knowledge-Based Systems, 166, 186-197. https://dx.doi.org/10.1016/j.knosys.2018.12.028 0950-7051 https://hdl.handle.net/10356/151672 10.1016/j.knosys.2018.12.028 2-s2.0-85059525095 166 186 197 en Knowledge-Based Systems © 2019 Elsevier B.V. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Large-scale Data Stream Analytics Distributed Data Stream Mining
spellingShingle	Engineering::Computer science and engineering Large-scale Data Stream Analytics Distributed Data Stream Mining Za'in, Choiru Pratama, Mahardhika Pardede, Eric Evolving large-scale data stream analytics based on scalable PANFIS
description	The main challenge in large-scale data stream analytics lies in the ability of machine learning to generate large-scale data knowledge in reasonable timeframe without suffering from a loss of accuracy. Many distributed machine learning frameworks have recently been built to speed up the large-scale data learning process. However, most distributed machine learning used in these frameworks still uses an offline algorithm model which cannot cope with the data stream problems. In fact, large-scale data are mostly generated by the non-stationary data stream where its pattern evolves over time. To address this problem, we propose a novel Evolving Large-scale Data Stream Analytics framework based on a Scalable Parsimonious Network based on Fuzzy Inference System (Scalable PANFIS), where the PANFIS evolving algorithm is distributed over the worker nodes in the cloud to learn large-scale data stream. Scalable PANFIS framework incorporates the active learning (AL) strategy and two model fusion methods. The AL accelerates the distributed learning process to generate an initial evolving large-scale data stream model (initial model), whereas the two model fusion methods aggregate an initial model to generate the final model. The final model represents the update of current large-scale data knowledge which can be used to infer future data. Extensive experiments on this framework are validated by measuring the accuracy and running time of four combinations of Scalable PANFIS and other Spark-based built in algorithms. The results indicate that Scalable PANFIS with AL improves the training time to be almost two times faster than Scalable PANFIS without AL. The results also show both rule merging and the voting mechanisms yield similar accuracy in general among Scalable PANFIS algorithms and they are generally better than Spark-based algorithms. In terms of running time, the Scalable PANFIS training time outperforms all Spark-based algorithms when classifying a multi-class label dataset.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Za'in, Choiru Pratama, Mahardhika Pardede, Eric
format	Article
author	Za'in, Choiru Pratama, Mahardhika Pardede, Eric
author_sort	Za'in, Choiru
title	Evolving large-scale data stream analytics based on scalable PANFIS
title_short	Evolving large-scale data stream analytics based on scalable PANFIS
title_full	Evolving large-scale data stream analytics based on scalable PANFIS
title_fullStr	Evolving large-scale data stream analytics based on scalable PANFIS
title_full_unstemmed	Evolving large-scale data stream analytics based on scalable PANFIS
title_sort	evolving large-scale data stream analytics based on scalable panfis
publishDate	2021
url	https://hdl.handle.net/10356/151672
_version_	1707050442763010048

Evolving large-scale data stream analytics based on scalable PANFIS

Similar Items