SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points

We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of...

Full description

Saved in:

Bibliographic Details
Main Author:	LI, Zhize
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2019
Subjects:	Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/8679 https://ink.library.smu.edu.sg/context/sis_research/article/9682/viewcontent/NeurIPS_2019_ssrgd_simple_stochastic_recursive_gradient_descent_for_escaping_saddle_points_Paper.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9682
record_format	dspace
spelling	sg-smu-ink.sis_research-96822024-03-28T09:06:21Z SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points LI, Zhize We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can find an $(\epsilon,\delta)$-second-order stationary point with $\widetilde{O}(\sqrt{n}/\epsilon^2 + \sqrt{n}/\delta^4 + n/\delta^3)$ stochastic gradient complexity for nonconvex finite-sum problems. As a by-product, SSRGD finds an $\epsilon$-first-order stationary point with $O(n+\sqrt{n}/\epsilon^2)$ stochastic gradients. These results are almost optimal since Fang et al. [2018] provided a lower bound $\Omega(\sqrt{n}/\epsilon^2)$ for finding even just an $\epsilon$-first-order stationary point. We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 [Allen-Zhu and Li, 2018]). Moreover, the simple SSRGD algorithm gets a simpler analysis. Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results. 2019-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8679 https://ink.library.smu.edu.sg/context/sis_research/article/9682/viewcontent/NeurIPS_2019_ssrgd_simple_stochastic_recursive_gradient_descent_for_escaping_saddle_points_Paper.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Databases and Information Systems
spellingShingle	Databases and Information Systems LI, Zhize SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points
description	We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can find an $(\epsilon,\delta)$-second-order stationary point with $\widetilde{O}(\sqrt{n}/\epsilon^2 + \sqrt{n}/\delta^4 + n/\delta^3)$ stochastic gradient complexity for nonconvex finite-sum problems. As a by-product, SSRGD finds an $\epsilon$-first-order stationary point with $O(n+\sqrt{n}/\epsilon^2)$ stochastic gradients. These results are almost optimal since Fang et al. [2018] provided a lower bound $\Omega(\sqrt{n}/\epsilon^2)$ for finding even just an $\epsilon$-first-order stationary point. We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 [Allen-Zhu and Li, 2018]). Moreover, the simple SSRGD algorithm gets a simpler analysis. Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results.
format	text
author	LI, Zhize
author_facet	LI, Zhize
author_sort	LI, Zhize
title	SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points
title_short	SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points
title_full	SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points
title_fullStr	SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points
title_full_unstemmed	SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points
title_sort	ssrgd: simple stochastic recursive gradient descent for escaping saddle points
publisher	Institutional Knowledge at Singapore Management University
publishDate	2019
url	https://ink.library.smu.edu.sg/sis_research/8679 https://ink.library.smu.edu.sg/context/sis_research/article/9682/viewcontent/NeurIPS_2019_ssrgd_simple_stochastic_recursive_gradient_descent_for_escaping_saddle_points_Paper.pdf
_version_	1795302170223968256

SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points

Similar Items