Online feature selection for mining big data

Most studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or the access to it is expensive to acquire the full set of attributes/fe...

Full description

Saved in:
Bibliographic Details
Main Authors: HOI, Steven C. H., WANG, Jialei, ZHAO, Peilin, JIN, Rong
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2012
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2402
https://ink.library.smu.edu.sg/context/sis_research/article/3402/viewcontent/OFS.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-3402
record_format dspace
spelling sg-smu-ink.sis_research-34022021-03-12T07:23:15Z Online feature selection for mining big data HOI, Steven C. H. WANG, Jialei ZHAO, Peilin JIN, Rong Most studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or the access to it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which the online learner is only allowed to maintain a classifier involved a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features are active and can be used for prediction. We address this challenge by studying sparsity regularization and truncation techniques. Specifically, we present an effective algorithm to solve the problem, give the theoretical analysis, and evaluate the empirical performance of the proposed algorithms for online feature selection on several public datasets. We also demonstrate the application of our online feature selection technique to tackle real-world problems of big data mining, which is significantly more scalable than some well-known batch feature selection algorithms. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques for large-scale applications. 2012-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2402 info:doi/10.1145/2351316.2351329 https://ink.library.smu.edu.sg/context/sis_research/article/3402/viewcontent/OFS.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Feature Selection Online Learning Classification Computer Sciences Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Feature Selection
Online Learning
Classification
Computer Sciences
Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Feature Selection
Online Learning
Classification
Computer Sciences
Databases and Information Systems
Numerical Analysis and Scientific Computing
HOI, Steven C. H.
WANG, Jialei
ZHAO, Peilin
JIN, Rong
Online feature selection for mining big data
description Most studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or the access to it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which the online learner is only allowed to maintain a classifier involved a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features are active and can be used for prediction. We address this challenge by studying sparsity regularization and truncation techniques. Specifically, we present an effective algorithm to solve the problem, give the theoretical analysis, and evaluate the empirical performance of the proposed algorithms for online feature selection on several public datasets. We also demonstrate the application of our online feature selection technique to tackle real-world problems of big data mining, which is significantly more scalable than some well-known batch feature selection algorithms. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques for large-scale applications.
format text
author HOI, Steven C. H.
WANG, Jialei
ZHAO, Peilin
JIN, Rong
author_facet HOI, Steven C. H.
WANG, Jialei
ZHAO, Peilin
JIN, Rong
author_sort HOI, Steven C. H.
title Online feature selection for mining big data
title_short Online feature selection for mining big data
title_full Online feature selection for mining big data
title_fullStr Online feature selection for mining big data
title_full_unstemmed Online feature selection for mining big data
title_sort online feature selection for mining big data
publisher Institutional Knowledge at Singapore Management University
publishDate 2012
url https://ink.library.smu.edu.sg/sis_research/2402
https://ink.library.smu.edu.sg/context/sis_research/article/3402/viewcontent/OFS.pdf
_version_ 1770572135120502784