Online feature selection and its applications

Feature selection is an important technique for data mining. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for...

Full description

Saved in:
Bibliographic Details
Main Authors: WANG, Jialei, ZHAO, Peilin, HOI, Steven C. H., JIN, Rong
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2014
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2277
https://ink.library.smu.edu.sg/context/sis_research/article/3277/viewcontent/Online_Feature_Selection_and_Its_Applications.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-3277
record_format dspace
spelling sg-smu-ink.sis_research-32772021-03-12T07:22:43Z Online feature selection and its applications WANG, Jialei ZHAO, Peilin HOI, Steven C. H. JIN, Rong Feature selection is an important technique for data mining. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of online feature selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of online feature selection is how to make accurate prediction for an instance using a small number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: 1) learning with full input, where an learner is allowed to access all the features to decide the subset of active features, and 2) learning with partial input, where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public data sets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of th- proposed techniques. 2014-03-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2277 info:doi/10.1109/TKDE.2013.32 https://ink.library.smu.edu.sg/context/sis_research/article/3277/viewcontent/Online_Feature_Selection_and_Its_Applications.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Feature selection online learning large-scale data mining classification big data analytics Computer Sciences Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Feature selection
online learning
large-scale data mining
classification
big data analytics
Computer Sciences
Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Feature selection
online learning
large-scale data mining
classification
big data analytics
Computer Sciences
Databases and Information Systems
Numerical Analysis and Scientific Computing
WANG, Jialei
ZHAO, Peilin
HOI, Steven C. H.
JIN, Rong
Online feature selection and its applications
description Feature selection is an important technique for data mining. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of online feature selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of online feature selection is how to make accurate prediction for an instance using a small number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: 1) learning with full input, where an learner is allowed to access all the features to decide the subset of active features, and 2) learning with partial input, where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public data sets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of th- proposed techniques.
format text
author WANG, Jialei
ZHAO, Peilin
HOI, Steven C. H.
JIN, Rong
author_facet WANG, Jialei
ZHAO, Peilin
HOI, Steven C. H.
JIN, Rong
author_sort WANG, Jialei
title Online feature selection and its applications
title_short Online feature selection and its applications
title_full Online feature selection and its applications
title_fullStr Online feature selection and its applications
title_full_unstemmed Online feature selection and its applications
title_sort online feature selection and its applications
publisher Institutional Knowledge at Singapore Management University
publishDate 2014
url https://ink.library.smu.edu.sg/sis_research/2277
https://ink.library.smu.edu.sg/context/sis_research/article/3277/viewcontent/Online_Feature_Selection_and_Its_Applications.pdf
_version_ 1770572071407976448