Dimension reduction and classifier-based feature selection for oversampled gene expression data and cancer classification

Gene expression data are usually known for having a large number of features. Usually, some of these features are irrelevant and redundant. However, in some cases, all features, despite being numerous, show high importance and contribute to the data analysis. In a similar fashion, gene expression da...

Full description

Saved in:

Bibliographic Details
Main Authors:	Petinrin, Olutomilayo Olayemi, Saeed, Faisal, Salim, Naomie, Muhammad Toseef, Muhammad Toseef, Liu, Zhe, Muyide, Ibukun Omotayo
Format:	Article
Language:	English
Published:	MDPI 2023
Subjects:	Q Science (General) QA75 Electronic computers. Computer science
Online Access:	http://eprints.utm.my/106539/1/NaomieSalim2023_DimensionReductionandClassifierBasedFeature.pdf http://eprints.utm.my/106539/ http://dx.doi.org/10.3390/pr11071940
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Teknologi Malaysia
Language:	English

id	my.utm.106539
record_format	eprints
spelling	my.utm.1065392024-07-09T06:48:28Z http://eprints.utm.my/106539/ Dimension reduction and classifier-based feature selection for oversampled gene expression data and cancer classification Petinrin, Olutomilayo Olayemi Saeed, Faisal Salim, Naomie Muhammad Toseef, Muhammad Toseef Liu, Zhe Muyide, Ibukun Omotayo Q Science (General) QA75 Electronic computers. Computer science Gene expression data are usually known for having a large number of features. Usually, some of these features are irrelevant and redundant. However, in some cases, all features, despite being numerous, show high importance and contribute to the data analysis. In a similar fashion, gene expression data sometimes have limited instances with a high rate of imbalance among the classes. This can limit the exposure of a classification model to instances of different categories, thereby influencing the performance of the model. In this study, we proposed a cancer detection approach that utilized data preprocessing techniques such as oversampling, feature selection, and classification models. The study used SVMSMOTE for the oversampling of the six examined datasets. Further, we examined different techniques for feature selection using dimension reduction methods and classifier-based feature ranking and selection. We trained six machine learning algorithms, using repeated 5-fold cross-validation on different microarray datasets. The performance of the algorithms differed based on the data and feature reduction technique used. MDPI 2023-07 Article PeerReviewed application/pdf en http://eprints.utm.my/106539/1/NaomieSalim2023_DimensionReductionandClassifierBasedFeature.pdf Petinrin, Olutomilayo Olayemi and Saeed, Faisal and Salim, Naomie and Muhammad Toseef, Muhammad Toseef and Liu, Zhe and Muyide, Ibukun Omotayo (2023) Dimension reduction and classifier-based feature selection for oversampled gene expression data and cancer classification. Processes, 11 (7). pp. 1-13. ISSN 2227-9717 http://dx.doi.org/10.3390/pr11071940 DOI:10.3390/pr11071940
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
language	English
topic	Q Science (General) QA75 Electronic computers. Computer science
spellingShingle	Q Science (General) QA75 Electronic computers. Computer science Petinrin, Olutomilayo Olayemi Saeed, Faisal Salim, Naomie Muhammad Toseef, Muhammad Toseef Liu, Zhe Muyide, Ibukun Omotayo Dimension reduction and classifier-based feature selection for oversampled gene expression data and cancer classification
description	Gene expression data are usually known for having a large number of features. Usually, some of these features are irrelevant and redundant. However, in some cases, all features, despite being numerous, show high importance and contribute to the data analysis. In a similar fashion, gene expression data sometimes have limited instances with a high rate of imbalance among the classes. This can limit the exposure of a classification model to instances of different categories, thereby influencing the performance of the model. In this study, we proposed a cancer detection approach that utilized data preprocessing techniques such as oversampling, feature selection, and classification models. The study used SVMSMOTE for the oversampling of the six examined datasets. Further, we examined different techniques for feature selection using dimension reduction methods and classifier-based feature ranking and selection. We trained six machine learning algorithms, using repeated 5-fold cross-validation on different microarray datasets. The performance of the algorithms differed based on the data and feature reduction technique used.
format	Article
author	Petinrin, Olutomilayo Olayemi Saeed, Faisal Salim, Naomie Muhammad Toseef, Muhammad Toseef Liu, Zhe Muyide, Ibukun Omotayo
author_facet	Petinrin, Olutomilayo Olayemi Saeed, Faisal Salim, Naomie Muhammad Toseef, Muhammad Toseef Liu, Zhe Muyide, Ibukun Omotayo
author_sort	Petinrin, Olutomilayo Olayemi
title	Dimension reduction and classifier-based feature selection for oversampled gene expression data and cancer classification
title_short	Dimension reduction and classifier-based feature selection for oversampled gene expression data and cancer classification
title_full	Dimension reduction and classifier-based feature selection for oversampled gene expression data and cancer classification
title_fullStr	Dimension reduction and classifier-based feature selection for oversampled gene expression data and cancer classification
title_full_unstemmed	Dimension reduction and classifier-based feature selection for oversampled gene expression data and cancer classification
title_sort	dimension reduction and classifier-based feature selection for oversampled gene expression data and cancer classification
publisher	MDPI
publishDate	2023
url	http://eprints.utm.my/106539/1/NaomieSalim2023_DimensionReductionandClassifierBasedFeature.pdf http://eprints.utm.my/106539/ http://dx.doi.org/10.3390/pr11071940
_version_	1805880827907670016

Dimension reduction and classifier-based feature selection for oversampled gene expression data and cancer classification

Similar Items