Efficient gradient support pursuit with less hard thresholding for cardinality-constrained learning
Recently, stochastic hard thresholding (HT) optimization methods [e.g., stochastic variance reduced gradient hard thresholding (SVRGHT)] are becoming more attractive for solving large-scale sparsity/rank-constrained problems. However, they have much higher HT oracle complexities, especially for high...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2021
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9049 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Summary: | Recently, stochastic hard thresholding (HT) optimization methods [e.g., stochastic variance reduced gradient hard thresholding (SVRGHT)] are becoming more attractive for solving large-scale sparsity/rank-constrained problems. However, they have much higher HT oracle complexities, especially for high-dimensional data or large-scale matrices. To address this issue and inspired by the well-known Gradient Support Pursuit (GraSP) method, this article proposes a new Relaxed Gradient Support Pursuit (RGraSP) framework. Unlike GraSP, RGraSP only requires to yield an approximation solution at each iteration. Based on the property of RGraSP, we also present an efficient stochastic variance reduction-gradient support pursuit algorithm and its fast version (called stochastic variance reduced gradient support pursuit (SVRGSP+). We prove that the gradient oracle complexity of both our algorithms is two times less than that of SVRGHT. In particular, their HT complexity is about κsˆ times less than that of SVRGHT, where κsˆ is the restricted condition number. Moreover, we prove that our algorithms enjoy fast linear convergence to an approximately global optimum, and also present an asynchronous parallel variant to deal with very high-dimensional and sparse data. Experimental results on both synthetic and real-world datasets show that our algorithms yield superior results than the state-of-the-art gradient HT methods. |
---|