Scalable teacher forcing network for semi-supervised large scale data streams

The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the pr...

Full description

Saved in:
Bibliographic Details
Main Authors: Pratama, Mahardhika, Za'in, Choiru, Lughofer, Edwin, Pardede, Eric, Rahayu, Dwi A. P.
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/159514
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-159514
record_format dspace
spelling sg-ntu-dr.10356-1595142022-06-24T07:00:07Z Scalable teacher forcing network for semi-supervised large scale data streams Pratama, Mahardhika Za'in, Choiru Lughofer, Edwin Pardede, Eric Rahayu, Dwi A. P. School of Computer Science and Engineering Engineering::Computer science and engineering Evolving Fuzzy Systems Concept Drifts The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the problem of semi-supervised large-scale data streams is little explored in the literature because most works are designed in the traditional single-node computing environments while also being fully supervised approaches. This paper offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to cope with the scarcity of labelled samples and the large-scale data streams simultaneously. WeScatterNet is crafted under distributed computing platform of Apache Spark with a data-free model fusion strategy for model compression after parallel computing stage. It features an open network structure to address the global and local drift problems while integrating a data augmentation, annotation and auto-correction (DA3) method for handling partially labelled data streams. The performance of WeScatterNet is numerically evaluated in the six large-scale data stream problems with only 25% label proportions. It shows highly competitive performance even if compared with fully supervised learners with 100% label proportions. Ministry of Education (MOE) This work is supported by Ministry of Education Republic of Singapore Tier 1 research grant. The third author acknowledges the support by the 'LCM - K2 Center for Symbiotic Mechatronics' within the framework of the Austrian COMET-K2 program. 2022-06-24T07:00:07Z 2022-06-24T07:00:07Z 2021 Journal Article Pratama, M., Za'in, C., Lughofer, E., Pardede, E. & Rahayu, D. A. P. (2021). Scalable teacher forcing network for semi-supervised large scale data streams. Information Sciences, 576, 407-431. https://dx.doi.org/10.1016/j.ins.2021.06.075 0020-0255 https://hdl.handle.net/10356/159514 10.1016/j.ins.2021.06.075 2-s2.0-85109455526 576 407 431 en Information Sciences © 2021 Elsevier Inc. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Evolving Fuzzy Systems
Concept Drifts
spellingShingle Engineering::Computer science and engineering
Evolving Fuzzy Systems
Concept Drifts
Pratama, Mahardhika
Za'in, Choiru
Lughofer, Edwin
Pardede, Eric
Rahayu, Dwi A. P.
Scalable teacher forcing network for semi-supervised large scale data streams
description The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the problem of semi-supervised large-scale data streams is little explored in the literature because most works are designed in the traditional single-node computing environments while also being fully supervised approaches. This paper offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to cope with the scarcity of labelled samples and the large-scale data streams simultaneously. WeScatterNet is crafted under distributed computing platform of Apache Spark with a data-free model fusion strategy for model compression after parallel computing stage. It features an open network structure to address the global and local drift problems while integrating a data augmentation, annotation and auto-correction (DA3) method for handling partially labelled data streams. The performance of WeScatterNet is numerically evaluated in the six large-scale data stream problems with only 25% label proportions. It shows highly competitive performance even if compared with fully supervised learners with 100% label proportions.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Pratama, Mahardhika
Za'in, Choiru
Lughofer, Edwin
Pardede, Eric
Rahayu, Dwi A. P.
format Article
author Pratama, Mahardhika
Za'in, Choiru
Lughofer, Edwin
Pardede, Eric
Rahayu, Dwi A. P.
author_sort Pratama, Mahardhika
title Scalable teacher forcing network for semi-supervised large scale data streams
title_short Scalable teacher forcing network for semi-supervised large scale data streams
title_full Scalable teacher forcing network for semi-supervised large scale data streams
title_fullStr Scalable teacher forcing network for semi-supervised large scale data streams
title_full_unstemmed Scalable teacher forcing network for semi-supervised large scale data streams
title_sort scalable teacher forcing network for semi-supervised large scale data streams
publishDate 2022
url https://hdl.handle.net/10356/159514
_version_ 1736856418755018752