Supervised feature selection using principal component analysis

The principal component analysis (PCA) is widely used in computational science branches such as computer science, pattern recognition, and machine learning, as it can effectively reduce the dimensionality of high-dimensional data. In particular, it is a popular transformation method used for feature...

Full description

Saved in:
Bibliographic Details
Main Authors: Rahmat, Fariq, Zulkafli, Zed, Ishak, Asnor Juraiza, Abdul Rahman, Ribhan Zafira, Stercke, Simon De, Buytaert, Wouter, Tahir, Wardah, Ab Rahman, Jamalludin, Ibrahim, Salwa, Ismail, Muhamad
Format: Article
Published: Springer Science and Business Media Deutschland GmbH 2023
Online Access:http://psasir.upm.edu.my/id/eprint/110338/
https://link.springer.com/article/10.1007/s10115-023-01993-5?error=cookies_not_supported&code=26d4082c-44cd-4a6b-95c7-0d84a3dabd51
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
id my.upm.eprints.110338
record_format eprints
spelling my.upm.eprints.1103382024-11-11T01:43:44Z http://psasir.upm.edu.my/id/eprint/110338/ Supervised feature selection using principal component analysis Rahmat, Fariq Zulkafli, Zed Ishak, Asnor Juraiza Abdul Rahman, Ribhan Zafira Stercke, Simon De Buytaert, Wouter Tahir, Wardah Ab Rahman, Jamalludin Ibrahim, Salwa Ismail, Muhamad The principal component analysis (PCA) is widely used in computational science branches such as computer science, pattern recognition, and machine learning, as it can effectively reduce the dimensionality of high-dimensional data. In particular, it is a popular transformation method used for feature extraction. In this study, we explore PCA’s ability for feature selection in regression applications. We introduce a new approach using PCA, called Targeted PCA to analyze a multivariate dataset that includes the dependent variable—it identifies the principal component with a high representation of the dependent variable and then examines the selected principal component to capture and rank the contribution of the non-dependent variables. The study also compares the feature selected with that resulting from a Least Absolute Shrinkage and Selection Operator (LASSO) regression. Finally, the selected features were tested in two regression models: multiple linear regression (MLR) and artificial neural network (ANN). The results are presented for three socioeconomic, environmental, and computer image processing datasets. Our study found that 2 of 3 random datasets have more than 50% similarity in the selected features by the PCA and LASSO regression methods. In the regression predictions, our PCA-selected features resulted in little difference compared to the LASSO regression-selected features in terms of the MLR prediction accuracy. However, the ANN regression demonstrated a faster convergence and a higher reduction of error. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023. Springer Science and Business Media Deutschland GmbH 2023-11 Article PeerReviewed Rahmat, Fariq and Zulkafli, Zed and Ishak, Asnor Juraiza and Abdul Rahman, Ribhan Zafira and Stercke, Simon De and Buytaert, Wouter and Tahir, Wardah and Ab Rahman, Jamalludin and Ibrahim, Salwa and Ismail, Muhamad (2023) Supervised feature selection using principal component analysis. Knowledge and Information Systems, 66 (3). pp. 1955-1995. ISSN 0219-1377; eISSN: 0219-3116 https://link.springer.com/article/10.1007/s10115-023-01993-5?error=cookies_not_supported&code=26d4082c-44cd-4a6b-95c7-0d84a3dabd51 10.1007/s10115-023-01993-5
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
description The principal component analysis (PCA) is widely used in computational science branches such as computer science, pattern recognition, and machine learning, as it can effectively reduce the dimensionality of high-dimensional data. In particular, it is a popular transformation method used for feature extraction. In this study, we explore PCA’s ability for feature selection in regression applications. We introduce a new approach using PCA, called Targeted PCA to analyze a multivariate dataset that includes the dependent variable—it identifies the principal component with a high representation of the dependent variable and then examines the selected principal component to capture and rank the contribution of the non-dependent variables. The study also compares the feature selected with that resulting from a Least Absolute Shrinkage and Selection Operator (LASSO) regression. Finally, the selected features were tested in two regression models: multiple linear regression (MLR) and artificial neural network (ANN). The results are presented for three socioeconomic, environmental, and computer image processing datasets. Our study found that 2 of 3 random datasets have more than 50% similarity in the selected features by the PCA and LASSO regression methods. In the regression predictions, our PCA-selected features resulted in little difference compared to the LASSO regression-selected features in terms of the MLR prediction accuracy. However, the ANN regression demonstrated a faster convergence and a higher reduction of error. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023.
format Article
author Rahmat, Fariq
Zulkafli, Zed
Ishak, Asnor Juraiza
Abdul Rahman, Ribhan Zafira
Stercke, Simon De
Buytaert, Wouter
Tahir, Wardah
Ab Rahman, Jamalludin
Ibrahim, Salwa
Ismail, Muhamad
spellingShingle Rahmat, Fariq
Zulkafli, Zed
Ishak, Asnor Juraiza
Abdul Rahman, Ribhan Zafira
Stercke, Simon De
Buytaert, Wouter
Tahir, Wardah
Ab Rahman, Jamalludin
Ibrahim, Salwa
Ismail, Muhamad
Supervised feature selection using principal component analysis
author_facet Rahmat, Fariq
Zulkafli, Zed
Ishak, Asnor Juraiza
Abdul Rahman, Ribhan Zafira
Stercke, Simon De
Buytaert, Wouter
Tahir, Wardah
Ab Rahman, Jamalludin
Ibrahim, Salwa
Ismail, Muhamad
author_sort Rahmat, Fariq
title Supervised feature selection using principal component analysis
title_short Supervised feature selection using principal component analysis
title_full Supervised feature selection using principal component analysis
title_fullStr Supervised feature selection using principal component analysis
title_full_unstemmed Supervised feature selection using principal component analysis
title_sort supervised feature selection using principal component analysis
publisher Springer Science and Business Media Deutschland GmbH
publishDate 2023
url http://psasir.upm.edu.my/id/eprint/110338/
https://link.springer.com/article/10.1007/s10115-023-01993-5?error=cookies_not_supported&code=26d4082c-44cd-4a6b-95c7-0d84a3dabd51
_version_ 1816132703510069248