SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points
We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2019
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8679 https://ink.library.smu.edu.sg/context/sis_research/article/9682/viewcontent/NeurIPS_2019_ssrgd_simple_stochastic_recursive_gradient_descent_for_escaping_saddle_points_Paper.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9682 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-96822024-03-28T09:06:21Z SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points LI, Zhize We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can find an $(\epsilon,\delta)$-second-order stationary point with $\widetilde{O}(\sqrt{n}/\epsilon^2 + \sqrt{n}/\delta^4 + n/\delta^3)$ stochastic gradient complexity for nonconvex finite-sum problems. As a by-product, SSRGD finds an $\epsilon$-first-order stationary point with $O(n+\sqrt{n}/\epsilon^2)$ stochastic gradients. These results are almost optimal since Fang et al. [2018] provided a lower bound $\Omega(\sqrt{n}/\epsilon^2)$ for finding even just an $\epsilon$-first-order stationary point. We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 [Allen-Zhu and Li, 2018]). Moreover, the simple SSRGD algorithm gets a simpler analysis. Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results. 2019-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8679 https://ink.library.smu.edu.sg/context/sis_research/article/9682/viewcontent/NeurIPS_2019_ssrgd_simple_stochastic_recursive_gradient_descent_for_escaping_saddle_points_Paper.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems |
spellingShingle |
Databases and Information Systems LI, Zhize SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points |
description |
We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can find an $(\epsilon,\delta)$-second-order stationary point with $\widetilde{O}(\sqrt{n}/\epsilon^2 + \sqrt{n}/\delta^4 + n/\delta^3)$ stochastic gradient complexity for nonconvex finite-sum problems. As a by-product, SSRGD finds an $\epsilon$-first-order stationary point with $O(n+\sqrt{n}/\epsilon^2)$ stochastic gradients. These results are almost optimal since Fang et al. [2018] provided a lower bound $\Omega(\sqrt{n}/\epsilon^2)$ for finding even just an $\epsilon$-first-order stationary point. We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 [Allen-Zhu and Li, 2018]). Moreover, the simple SSRGD algorithm gets a simpler analysis. Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results. |
format |
text |
author |
LI, Zhize |
author_facet |
LI, Zhize |
author_sort |
LI, Zhize |
title |
SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points |
title_short |
SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points |
title_full |
SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points |
title_fullStr |
SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points |
title_full_unstemmed |
SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points |
title_sort |
ssrgd: simple stochastic recursive gradient descent for escaping saddle points |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2019 |
url |
https://ink.library.smu.edu.sg/sis_research/8679 https://ink.library.smu.edu.sg/context/sis_research/article/9682/viewcontent/NeurIPS_2019_ssrgd_simple_stochastic_recursive_gradient_descent_for_escaping_saddle_points_Paper.pdf |
_version_ |
1795302170223968256 |