A novel drift detection algorithm based on features’ importance analysis in a data streams environment
The training set consists of many features that influence the classifier in different degrees. Choosing the most important features and rejecting those that do not carry relevant information is of great importance to the operating of the learned model. In the case of data streams, the importance of...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/145350 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-145350 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1453502020-12-18T02:27:06Z A novel drift detection algorithm based on features’ importance analysis in a data streams environment Duda, Piotr Przybyszewski, Krzysztof Wang, Lipo School of Electrical and Electronic Engineering Engineering::Computer science and engineering Data Stream Mining Random Forest The training set consists of many features that influence the classifier in different degrees. Choosing the most important features and rejecting those that do not carry relevant information is of great importance to the operating of the learned model. In the case of data streams, the importance of the features may additionally change over time. Such changes affect the performance of the classifier but can also be an important indicator of occurring concept-drift. In this work, we propose a new algorithm for data streams classification, called Random Forest with Features Importance (RFFI), which uses the measure of features importance as a drift detector. The RFFT algorithm implements solutions inspired by the Random Forest algorithm to the data stream scenarios. The proposed algorithm combines the ability of ensemble methods for handling slow changes in a data stream with a new method for detecting concept drift occurrence. The work contains an experimental analysis of the proposed algorithm, carried out on synthetic and real data. Published version 2020-12-18T02:27:06Z 2020-12-18T02:27:06Z 2020 Journal Article Duda, P., Przybyszewski, K., & Wang, L. (2020). A novel drift detection algorithm based on features’ importance analysis in a data streams environment. Journal of Artificial Intelligence and Soft Computing Research, 10(4), 287-298. doi:10.2478/jaiscr-2020-0019 2083-2567 https://hdl.handle.net/10356/145350 10.2478/jaiscr-2020-0019 4 10 287 298 en Journal of Artificial Intelligence and Soft Computing Research © 2020 The Author(s) (published by Sciendo). This is an open-access article distributed under the terms of the Creative Commons Attribution License. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Data Stream Mining Random Forest |
spellingShingle |
Engineering::Computer science and engineering Data Stream Mining Random Forest Duda, Piotr Przybyszewski, Krzysztof Wang, Lipo A novel drift detection algorithm based on features’ importance analysis in a data streams environment |
description |
The training set consists of many features that influence the classifier in different degrees. Choosing the most important features and rejecting those that do not carry relevant information is of great importance to the operating of the learned model. In the case of data streams, the importance of the features may additionally change over time. Such changes affect the performance of the classifier but can also be an important indicator of occurring concept-drift. In this work, we propose a new algorithm for data streams classification, called Random Forest with Features Importance (RFFI), which uses the measure of features importance as a drift detector. The RFFT algorithm implements solutions inspired by the Random Forest algorithm to the data stream scenarios. The proposed algorithm combines the ability of ensemble methods for handling slow changes in a data stream with a new method for detecting concept drift occurrence. The work contains an experimental analysis of the proposed algorithm, carried out on synthetic and real data. |
author2 |
School of Electrical and Electronic Engineering |
author_facet |
School of Electrical and Electronic Engineering Duda, Piotr Przybyszewski, Krzysztof Wang, Lipo |
format |
Article |
author |
Duda, Piotr Przybyszewski, Krzysztof Wang, Lipo |
author_sort |
Duda, Piotr |
title |
A novel drift detection algorithm based on features’ importance analysis in a data streams environment |
title_short |
A novel drift detection algorithm based on features’ importance analysis in a data streams environment |
title_full |
A novel drift detection algorithm based on features’ importance analysis in a data streams environment |
title_fullStr |
A novel drift detection algorithm based on features’ importance analysis in a data streams environment |
title_full_unstemmed |
A novel drift detection algorithm based on features’ importance analysis in a data streams environment |
title_sort |
novel drift detection algorithm based on features’ importance analysis in a data streams environment |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/145350 |
_version_ |
1688665700451745792 |