A two-stage feature selection algorithm based on redundancy and relevance

Resulting from technological advancements, it is now possible to regularly collect large volumes of data and to use these data for different applications. However, this obviously results in having very large numbers of samples as well as features. Dealing with high-volume and high-dimensional data i...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Antioquia, Arren Matthew C., Azcarraga, Arnulfo P.
التنسيق:	text
منشور في:	Animo Repository 2018
الموضوعات:	Learning classifier systems Big data Document clustering Nearest neighbor analysis (Statistics)
الوصول للمادة أونلاين:	https://animorepository.dlsu.edu.ph/faculty_research/4428
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	De La Salle University

id	oai:animorepository.dlsu.edu.ph:faculty_research-5304
record_format	eprints
spelling	oai:animorepository.dlsu.edu.ph:faculty_research-53042022-01-06T05:47:18Z A two-stage feature selection algorithm based on redundancy and relevance Antioquia, Arren Matthew C. Azcarraga, Arnulfo P. Resulting from technological advancements, it is now possible to regularly collect large volumes of data and to use these data for different applications. However, this obviously results in having very large numbers of samples as well as features. Dealing with high-volume and high-dimensional data is indeed a major challenge for machine learning algorithms, especially in terms of memory requirement and model training time. Fortunately, many of the features in the collected data are usually correlated, and some can be even be completely irrelevant for specific classification or pattern recognition tasks. By the nature of high-dimensional data, the large set of features can be reduced by removing redundant and irrelevant features. A two-stage feature selection algorithm based on feature redundancy and feature relevance is proposed in this paper. The proposed feature selection algorithm employs a hybrid model which combines filter and wrapper schemes to select the optimal feature subset. Five datasets from different domains are used to test the performance of the proposed feature selection algorithm based on three well-known machine learning algorithms, namely, k-Nearest Neighbor, Decision Trees, and Multilayer Perceptrons. Despite reducing the number of features using the proposed feature selection approach, the classification performances of the selected feature subsets are on par with or even significantly higher than the performance of the original feature set. Comparing with other state-of-the-art feature selection algorithms, the proposed method achieved higher classification accuracy with even lower number of features. © 2018 IEEE. 2018-10-10T07:00:00Z text https://animorepository.dlsu.edu.ph/faculty_research/4428 info:doi/10.1109/IJCNN.2018.8489072 Faculty Research Work Animo Repository Learning classifier systems Big data Document clustering Nearest neighbor analysis (Statistics)
institution	De La Salle University
building	De La Salle University Library
continent	Asia
country	Philippines Philippines
content_provider	De La Salle University Library
collection	DLSU Institutional Repository
topic	Learning classifier systems Big data Document clustering Nearest neighbor analysis (Statistics)
spellingShingle	Learning classifier systems Big data Document clustering Nearest neighbor analysis (Statistics) Antioquia, Arren Matthew C. Azcarraga, Arnulfo P. A two-stage feature selection algorithm based on redundancy and relevance
description	Resulting from technological advancements, it is now possible to regularly collect large volumes of data and to use these data for different applications. However, this obviously results in having very large numbers of samples as well as features. Dealing with high-volume and high-dimensional data is indeed a major challenge for machine learning algorithms, especially in terms of memory requirement and model training time. Fortunately, many of the features in the collected data are usually correlated, and some can be even be completely irrelevant for specific classification or pattern recognition tasks. By the nature of high-dimensional data, the large set of features can be reduced by removing redundant and irrelevant features. A two-stage feature selection algorithm based on feature redundancy and feature relevance is proposed in this paper. The proposed feature selection algorithm employs a hybrid model which combines filter and wrapper schemes to select the optimal feature subset. Five datasets from different domains are used to test the performance of the proposed feature selection algorithm based on three well-known machine learning algorithms, namely, k-Nearest Neighbor, Decision Trees, and Multilayer Perceptrons. Despite reducing the number of features using the proposed feature selection approach, the classification performances of the selected feature subsets are on par with or even significantly higher than the performance of the original feature set. Comparing with other state-of-the-art feature selection algorithms, the proposed method achieved higher classification accuracy with even lower number of features. © 2018 IEEE.
format	text
author	Antioquia, Arren Matthew C. Azcarraga, Arnulfo P.
author_facet	Antioquia, Arren Matthew C. Azcarraga, Arnulfo P.
author_sort	Antioquia, Arren Matthew C.
title	A two-stage feature selection algorithm based on redundancy and relevance
title_short	A two-stage feature selection algorithm based on redundancy and relevance
title_full	A two-stage feature selection algorithm based on redundancy and relevance
title_fullStr	A two-stage feature selection algorithm based on redundancy and relevance
title_full_unstemmed	A two-stage feature selection algorithm based on redundancy and relevance
title_sort	two-stage feature selection algorithm based on redundancy and relevance
publisher	Animo Repository
publishDate	2018
url	https://animorepository.dlsu.edu.ph/faculty_research/4428
_version_	1767196105191194624

A two-stage feature selection algorithm based on redundancy and relevance

مواد مشابهة