SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points

We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of...

Full description

Saved in:
Bibliographic Details
Main Author: LI, Zhize
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8679
https://ink.library.smu.edu.sg/context/sis_research/article/9682/viewcontent/NeurIPS_2019_ssrgd_simple_stochastic_recursive_gradient_descent_for_escaping_saddle_points_Paper.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9682
record_format dspace
spelling sg-smu-ink.sis_research-96822024-03-28T09:06:21Z SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points LI, Zhize We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can find an $(\epsilon,\delta)$-second-order stationary point with $\widetilde{O}(\sqrt{n}/\epsilon^2 + \sqrt{n}/\delta^4 + n/\delta^3)$ stochastic gradient complexity for nonconvex finite-sum problems. As a by-product, SSRGD finds an $\epsilon$-first-order stationary point with $O(n+\sqrt{n}/\epsilon^2)$ stochastic gradients. These results are almost optimal since Fang et al. [2018] provided a lower bound $\Omega(\sqrt{n}/\epsilon^2)$ for finding even just an $\epsilon$-first-order stationary point. We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 [Allen-Zhu and Li, 2018]). Moreover, the simple SSRGD algorithm gets a simpler analysis. Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results. 2019-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8679 https://ink.library.smu.edu.sg/context/sis_research/article/9682/viewcontent/NeurIPS_2019_ssrgd_simple_stochastic_recursive_gradient_descent_for_escaping_saddle_points_Paper.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
spellingShingle Databases and Information Systems
LI, Zhize
SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points
description We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can find an $(\epsilon,\delta)$-second-order stationary point with $\widetilde{O}(\sqrt{n}/\epsilon^2 + \sqrt{n}/\delta^4 + n/\delta^3)$ stochastic gradient complexity for nonconvex finite-sum problems. As a by-product, SSRGD finds an $\epsilon$-first-order stationary point with $O(n+\sqrt{n}/\epsilon^2)$ stochastic gradients. These results are almost optimal since Fang et al. [2018] provided a lower bound $\Omega(\sqrt{n}/\epsilon^2)$ for finding even just an $\epsilon$-first-order stationary point. We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 [Allen-Zhu and Li, 2018]). Moreover, the simple SSRGD algorithm gets a simpler analysis. Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results.
format text
author LI, Zhize
author_facet LI, Zhize
author_sort LI, Zhize
title SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points
title_short SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points
title_full SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points
title_fullStr SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points
title_full_unstemmed SSRGD: Simple Stochastic Recursive Gradient Descent for escaping saddle points
title_sort ssrgd: simple stochastic recursive gradient descent for escaping saddle points
publisher Institutional Knowledge at Singapore Management University
publishDate 2019
url https://ink.library.smu.edu.sg/sis_research/8679
https://ink.library.smu.edu.sg/context/sis_research/article/9682/viewcontent/NeurIPS_2019_ssrgd_simple_stochastic_recursive_gradient_descent_for_escaping_saddle_points_Paper.pdf
_version_ 1795302170223968256