Efficient gradient support pursuit with less hard thresholding for cardinality-constrained learning

Recently, stochastic hard thresholding (HT) optimization methods [e.g., stochastic variance reduced gradient hard thresholding (SVRGHT)] are becoming more attractive for solving large-scale sparsity/rank-constrained problems. However, they have much higher HT oracle complexities, especially for high...

Full description

Saved in:
Bibliographic Details
Main Authors: SHANG, Fanhua, WEI, Bingkun, LIU, Hongying, LIU, Yuanyuan, ZHOU, Pan, GONG, Maoguo
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9049
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Recently, stochastic hard thresholding (HT) optimization methods [e.g., stochastic variance reduced gradient hard thresholding (SVRGHT)] are becoming more attractive for solving large-scale sparsity/rank-constrained problems. However, they have much higher HT oracle complexities, especially for high-dimensional data or large-scale matrices. To address this issue and inspired by the well-known Gradient Support Pursuit (GraSP) method, this article proposes a new Relaxed Gradient Support Pursuit (RGraSP) framework. Unlike GraSP, RGraSP only requires to yield an approximation solution at each iteration. Based on the property of RGraSP, we also present an efficient stochastic variance reduction-gradient support pursuit algorithm and its fast version (called stochastic variance reduced gradient support pursuit (SVRGSP+). We prove that the gradient oracle complexity of both our algorithms is two times less than that of SVRGHT. In particular, their HT complexity is about κsˆ times less than that of SVRGHT, where κsˆ is the restricted condition number. Moreover, we prove that our algorithms enjoy fast linear convergence to an approximately global optimum, and also present an asynchronous parallel variant to deal with very high-dimensional and sparse data. Experimental results on both synthetic and real-world datasets show that our algorithms yield superior results than the state-of-the-art gradient HT methods.