A simple proximal stochastic gradient method for nonsmooth nonconvex optimization

We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component. We propose a proxim...

Full description

Saved in:
Bibliographic Details
Main Authors: LI, Zhize, LI, Jian
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8674
https://ink.library.smu.edu.sg/context/sis_research/article/9677/viewcontent/NeurIPS_2018_a_simple_proximal_stochastic_gradient_method_for_nonsmooth_nonconvex_optimization_Paper.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9677
record_format dspace
spelling sg-smu-ink.sis_research-96772024-03-28T09:09:04Z A simple proximal stochastic gradient method for nonsmooth nonconvex optimization LI, Zhize LI, Jian We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component. We propose a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+. Our main contribution lies in the analysis of ProxSVRG+. It recovers several existing convergence results and improves/generalizes them (in terms of the number of stochastic gradient oracle calls and proximal oracle calls). In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., NIPS'17] for the smooth nonconvex case. ProxSVRG+ is also more straightforward than SCSG and yields simpler analysis. Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, which partially solves an open problem proposed in [Reddi et al., NIPS'16]. Also, ProxSVRG+ uses much less proximal oracle calls than ProxSVRG [Reddi et al., NIPS'16]. Moreover, for nonconvex functions satisfied Polyak-\L{}ojasiewicz condition, we prove that ProxSVRG+ achieves a global linear convergence rate without restart unlike ProxSVRG. Thus, it can \emph{automatically} switch to the faster linear convergence in some regions as long as the objective function satisfies the PL condition locally in these regions. Finally, we conduct several experiments and the experimental results are consistent with the theoretical results. 2018-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8674 https://ink.library.smu.edu.sg/context/sis_research/article/9677/viewcontent/NeurIPS_2018_a_simple_proximal_stochastic_gradient_method_for_nonsmooth_nonconvex_optimization_Paper.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
spellingShingle Databases and Information Systems
LI, Zhize
LI, Jian
A simple proximal stochastic gradient method for nonsmooth nonconvex optimization
description We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component. We propose a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+. Our main contribution lies in the analysis of ProxSVRG+. It recovers several existing convergence results and improves/generalizes them (in terms of the number of stochastic gradient oracle calls and proximal oracle calls). In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., NIPS'17] for the smooth nonconvex case. ProxSVRG+ is also more straightforward than SCSG and yields simpler analysis. Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, which partially solves an open problem proposed in [Reddi et al., NIPS'16]. Also, ProxSVRG+ uses much less proximal oracle calls than ProxSVRG [Reddi et al., NIPS'16]. Moreover, for nonconvex functions satisfied Polyak-\L{}ojasiewicz condition, we prove that ProxSVRG+ achieves a global linear convergence rate without restart unlike ProxSVRG. Thus, it can \emph{automatically} switch to the faster linear convergence in some regions as long as the objective function satisfies the PL condition locally in these regions. Finally, we conduct several experiments and the experimental results are consistent with the theoretical results.
format text
author LI, Zhize
LI, Jian
author_facet LI, Zhize
LI, Jian
author_sort LI, Zhize
title A simple proximal stochastic gradient method for nonsmooth nonconvex optimization
title_short A simple proximal stochastic gradient method for nonsmooth nonconvex optimization
title_full A simple proximal stochastic gradient method for nonsmooth nonconvex optimization
title_fullStr A simple proximal stochastic gradient method for nonsmooth nonconvex optimization
title_full_unstemmed A simple proximal stochastic gradient method for nonsmooth nonconvex optimization
title_sort simple proximal stochastic gradient method for nonsmooth nonconvex optimization
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/8674
https://ink.library.smu.edu.sg/context/sis_research/article/9677/viewcontent/NeurIPS_2018_a_simple_proximal_stochastic_gradient_method_for_nonsmooth_nonconvex_optimization_Paper.pdf
_version_ 1795302169320095744