Joint structured bipartite graph and row-sparse projection for large-scale feature selection

Feature selection plays an important role in data analysis, yet traditional graph-based methods often produce suboptimal results. These methods typically follow a two-stage process: constructing a graph with data-to-data affinities or a bipartite graph with data-to-anchor affinities and independentl...

Full description

Saved in:
Bibliographic Details
Main Authors: Dong, Xia, Nie, Feiping, Wu, Danyang, Wang, Rong, Li, Xuelong
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175915
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Feature selection plays an important role in data analysis, yet traditional graph-based methods often produce suboptimal results. These methods typically follow a two-stage process: constructing a graph with data-to-data affinities or a bipartite graph with data-to-anchor affinities and independently selecting features based on their scores. In this article, a larges-cale feature selection approach based on structured bipartite graph and row-sparse projection (RS2BLFS) is proposed to overcome this limitation. RS2BLFS integrates the construction of a structured bipartite graph consisting of c connected components into row-sparse projection learning with k nonzero rows. This integration allows for the joint selection of an optimal feature subset in an unsupervised manner. Notably, the c connected components of the structured bipartite graph correspond to c clusters, each with multiple subcluster centers. This feature makes RS2BLFS particularly effective for feature selection and clustering on nonspherical large-scale data. An algorithm with theoretical analysis is developed to solve the optimization problem involved in RS2BLFS. Experimental results on synthetic and real-world datasets confirm its effectiveness in feature selection tasks.