A fast Anderson-Chebyshev acceleration for nonlinear optimization

Anderson acceleration (or Anderson mixing) is an efficient acceleration method for fixed point iterations $x_{t+1}=G(x_t)$, e.g., gradient descent can be viewed as iteratively applying the operation $G(x) \triangleq x-\alpha\nabla f(x)$. It is known that Anderson acceleration is quite efficient in p...

Full description

Saved in:
Bibliographic Details
Main Authors: LI, Zhize, LI, Jian
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2020
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8680
https://ink.library.smu.edu.sg/context/sis_research/article/9683/viewcontent/AISTATS20_full_anderson.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9683
record_format dspace
spelling sg-smu-ink.sis_research-96832024-03-28T09:05:37Z A fast Anderson-Chebyshev acceleration for nonlinear optimization LI, Zhize LI, Jian Anderson acceleration (or Anderson mixing) is an efficient acceleration method for fixed point iterations $x_{t+1}=G(x_t)$, e.g., gradient descent can be viewed as iteratively applying the operation $G(x) \triangleq x-\alpha\nabla f(x)$. It is known that Anderson acceleration is quite efficient in practice and can be viewed as an extension of Krylov subspace methods for nonlinear problems. In this paper, we show that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate $O(\sqrt{\kappa}\ln\frac{1}{\epsilon})$, which improves the previous result $O(\kappa\ln\frac{1}{\epsilon})$ provided by (Toth and Kelley, 2015) for quadratic functions. Moreover, we provide a convergence analysis for minimizing general nonlinear problems. Besides, if the hyperparameters (e.g., the Lipschitz smooth parameter $L$) are not available, we propose a guessing algorithm for guessing them dynamically and also prove a similar convergence rate. Finally, the experimental results demonstrate that the proposed Anderson-Chebyshev acceleration method converges significantly faster than other algorithms, e.g., vanilla gradient descent (GD), Nesterov's Accelerated GD. Also, these algorithms combined with the proposed guessing algorithm (guessing the hyperparameters dynamically) achieve much better performance. 2020-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8680 https://ink.library.smu.edu.sg/context/sis_research/article/9683/viewcontent/AISTATS20_full_anderson.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
spellingShingle Databases and Information Systems
LI, Zhize
LI, Jian
A fast Anderson-Chebyshev acceleration for nonlinear optimization
description Anderson acceleration (or Anderson mixing) is an efficient acceleration method for fixed point iterations $x_{t+1}=G(x_t)$, e.g., gradient descent can be viewed as iteratively applying the operation $G(x) \triangleq x-\alpha\nabla f(x)$. It is known that Anderson acceleration is quite efficient in practice and can be viewed as an extension of Krylov subspace methods for nonlinear problems. In this paper, we show that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate $O(\sqrt{\kappa}\ln\frac{1}{\epsilon})$, which improves the previous result $O(\kappa\ln\frac{1}{\epsilon})$ provided by (Toth and Kelley, 2015) for quadratic functions. Moreover, we provide a convergence analysis for minimizing general nonlinear problems. Besides, if the hyperparameters (e.g., the Lipschitz smooth parameter $L$) are not available, we propose a guessing algorithm for guessing them dynamically and also prove a similar convergence rate. Finally, the experimental results demonstrate that the proposed Anderson-Chebyshev acceleration method converges significantly faster than other algorithms, e.g., vanilla gradient descent (GD), Nesterov's Accelerated GD. Also, these algorithms combined with the proposed guessing algorithm (guessing the hyperparameters dynamically) achieve much better performance.
format text
author LI, Zhize
LI, Jian
author_facet LI, Zhize
LI, Jian
author_sort LI, Zhize
title A fast Anderson-Chebyshev acceleration for nonlinear optimization
title_short A fast Anderson-Chebyshev acceleration for nonlinear optimization
title_full A fast Anderson-Chebyshev acceleration for nonlinear optimization
title_fullStr A fast Anderson-Chebyshev acceleration for nonlinear optimization
title_full_unstemmed A fast Anderson-Chebyshev acceleration for nonlinear optimization
title_sort fast anderson-chebyshev acceleration for nonlinear optimization
publisher Institutional Knowledge at Singapore Management University
publishDate 2020
url https://ink.library.smu.edu.sg/sis_research/8680
https://ink.library.smu.edu.sg/context/sis_research/article/9683/viewcontent/AISTATS20_full_anderson.pdf
_version_ 1795302170417954816