Simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization

We propose and analyze several stochastic gradient algorithms for finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems. First, we propose a simple proximal stochastic gradient algorithm based on variance reduction c...

Full description

Saved in:
Bibliographic Details
Main Authors: LI, Zhize, LI, Jian
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8692
https://ink.library.smu.edu.sg/context/sis_research/article/9695/viewcontent/JMLR22_nonsmooth__1_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9695
record_format dspace
spelling sg-smu-ink.sis_research-96952024-03-28T08:43:21Z Simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization LI, Zhize LI, Jian We propose and analyze several stochastic gradient algorithms for finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems. First, we propose a simple proximal stochastic gradient algorithm based on variance reduction called ProxSVRG+. We provide a clean and tight analysis of ProxSVRG+, which shows that it outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, hence solves an open problem proposed in Reddi et al. (2016b). Also, ProxSVRG+ uses much less proximal oracle calls than ProxSVRG (Reddi et al., 2016b) and extends to the online setting by avoiding full gradient computations. Then, we further propose an optimal algorithm, called SSRGD, based on SARAH (Nguyen et al., 2017) and show that SSRGD further improves the gradient complexity of ProxSVRG+ and achieves the optimal upper bound, matching the known lower bound of (Fang et al., 2018; Li et al., 2021). Moreover, we show that both ProxSVRG+ and SSRGD enjoy automatic adaptation with local structure of the objective function such as the Polyak-\L{}ojasiewicz (PL) condition for nonconvex functions in the finite-sum case, i.e., we prove that both of them can automatically switch to faster global linear convergence without any restart performed in prior work ProxSVRG (Reddi et al., 2016b). Finally, we focus on the more challenging problem of finding an $(\epsilon, \delta)$-local minimum instead of just finding an $\epsilon$-approximate (first-order) stationary point (which may be some bad unstable saddle points). We show that SSRGD can find an $(\epsilon, \delta)$-local minimum by simply adding some random perturbations. Our algorithm is almost as simple as its counterpart for finding stationary points, and achieves similar optimal rates. 2022-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8692 info:doi/10.5555/3586589.3586828 https://ink.library.smu.edu.sg/context/sis_research/article/9695/viewcontent/JMLR22_nonsmooth__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University nonconvex optimization optimal algorithm proximal gradient descent variance reduction local minimum Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic nonconvex optimization
optimal algorithm
proximal gradient descent
variance reduction
local minimum
Databases and Information Systems
spellingShingle nonconvex optimization
optimal algorithm
proximal gradient descent
variance reduction
local minimum
Databases and Information Systems
LI, Zhize
LI, Jian
Simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization
description We propose and analyze several stochastic gradient algorithms for finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems. First, we propose a simple proximal stochastic gradient algorithm based on variance reduction called ProxSVRG+. We provide a clean and tight analysis of ProxSVRG+, which shows that it outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, hence solves an open problem proposed in Reddi et al. (2016b). Also, ProxSVRG+ uses much less proximal oracle calls than ProxSVRG (Reddi et al., 2016b) and extends to the online setting by avoiding full gradient computations. Then, we further propose an optimal algorithm, called SSRGD, based on SARAH (Nguyen et al., 2017) and show that SSRGD further improves the gradient complexity of ProxSVRG+ and achieves the optimal upper bound, matching the known lower bound of (Fang et al., 2018; Li et al., 2021). Moreover, we show that both ProxSVRG+ and SSRGD enjoy automatic adaptation with local structure of the objective function such as the Polyak-\L{}ojasiewicz (PL) condition for nonconvex functions in the finite-sum case, i.e., we prove that both of them can automatically switch to faster global linear convergence without any restart performed in prior work ProxSVRG (Reddi et al., 2016b). Finally, we focus on the more challenging problem of finding an $(\epsilon, \delta)$-local minimum instead of just finding an $\epsilon$-approximate (first-order) stationary point (which may be some bad unstable saddle points). We show that SSRGD can find an $(\epsilon, \delta)$-local minimum by simply adding some random perturbations. Our algorithm is almost as simple as its counterpart for finding stationary points, and achieves similar optimal rates.
format text
author LI, Zhize
LI, Jian
author_facet LI, Zhize
LI, Jian
author_sort LI, Zhize
title Simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization
title_short Simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization
title_full Simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization
title_fullStr Simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization
title_full_unstemmed Simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization
title_sort simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/8692
https://ink.library.smu.edu.sg/context/sis_research/article/9695/viewcontent/JMLR22_nonsmooth__1_.pdf
_version_ 1795302174379474944