A novel drift detection algorithm based on features’ importance analysis in a data streams environment

The training set consists of many features that influence the classifier in different degrees. Choosing the most important features and rejecting those that do not carry relevant information is of great importance to the operating of the learned model. In the case of data streams, the importance of...

Full description

Saved in:
Bibliographic Details
Main Authors: Duda, Piotr, Przybyszewski, Krzysztof, Wang, Lipo
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/145350
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-145350
record_format dspace
spelling sg-ntu-dr.10356-1453502020-12-18T02:27:06Z A novel drift detection algorithm based on features’ importance analysis in a data streams environment Duda, Piotr Przybyszewski, Krzysztof Wang, Lipo School of Electrical and Electronic Engineering Engineering::Computer science and engineering Data Stream Mining Random Forest The training set consists of many features that influence the classifier in different degrees. Choosing the most important features and rejecting those that do not carry relevant information is of great importance to the operating of the learned model. In the case of data streams, the importance of the features may additionally change over time. Such changes affect the performance of the classifier but can also be an important indicator of occurring concept-drift. In this work, we propose a new algorithm for data streams classification, called Random Forest with Features Importance (RFFI), which uses the measure of features importance as a drift detector. The RFFT algorithm implements solutions inspired by the Random Forest algorithm to the data stream scenarios. The proposed algorithm combines the ability of ensemble methods for handling slow changes in a data stream with a new method for detecting concept drift occurrence. The work contains an experimental analysis of the proposed algorithm, carried out on synthetic and real data. Published version 2020-12-18T02:27:06Z 2020-12-18T02:27:06Z 2020 Journal Article Duda, P., Przybyszewski, K., & Wang, L. (2020). A novel drift detection algorithm based on features’ importance analysis in a data streams environment. Journal of Artificial Intelligence and Soft Computing Research, 10(4), 287-298. doi:10.2478/jaiscr-2020-0019 2083-2567 https://hdl.handle.net/10356/145350 10.2478/jaiscr-2020-0019 4 10 287 298 en Journal of Artificial Intelligence and Soft Computing Research © 2020 The Author(s) (published by Sciendo). This is an open-access article distributed under the terms of the Creative Commons Attribution License. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Data Stream Mining
Random Forest
spellingShingle Engineering::Computer science and engineering
Data Stream Mining
Random Forest
Duda, Piotr
Przybyszewski, Krzysztof
Wang, Lipo
A novel drift detection algorithm based on features’ importance analysis in a data streams environment
description The training set consists of many features that influence the classifier in different degrees. Choosing the most important features and rejecting those that do not carry relevant information is of great importance to the operating of the learned model. In the case of data streams, the importance of the features may additionally change over time. Such changes affect the performance of the classifier but can also be an important indicator of occurring concept-drift. In this work, we propose a new algorithm for data streams classification, called Random Forest with Features Importance (RFFI), which uses the measure of features importance as a drift detector. The RFFT algorithm implements solutions inspired by the Random Forest algorithm to the data stream scenarios. The proposed algorithm combines the ability of ensemble methods for handling slow changes in a data stream with a new method for detecting concept drift occurrence. The work contains an experimental analysis of the proposed algorithm, carried out on synthetic and real data.
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Duda, Piotr
Przybyszewski, Krzysztof
Wang, Lipo
format Article
author Duda, Piotr
Przybyszewski, Krzysztof
Wang, Lipo
author_sort Duda, Piotr
title A novel drift detection algorithm based on features’ importance analysis in a data streams environment
title_short A novel drift detection algorithm based on features’ importance analysis in a data streams environment
title_full A novel drift detection algorithm based on features’ importance analysis in a data streams environment
title_fullStr A novel drift detection algorithm based on features’ importance analysis in a data streams environment
title_full_unstemmed A novel drift detection algorithm based on features’ importance analysis in a data streams environment
title_sort novel drift detection algorithm based on features’ importance analysis in a data streams environment
publishDate 2020
url https://hdl.handle.net/10356/145350
_version_ 1688665700451745792