A fast Anderson-Chebyshev acceleration for nonlinear optimization
Anderson acceleration (or Anderson mixing) is an efficient acceleration method for fixed point iterations $x_{t+1}=G(x_t)$, e.g., gradient descent can be viewed as iteratively applying the operation $G(x) \triangleq x-\alpha\nabla f(x)$. It is known that Anderson acceleration is quite efficient in p...
Saved in:
Main Authors: | , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2020
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8680 https://ink.library.smu.edu.sg/context/sis_research/article/9683/viewcontent/AISTATS20_full_anderson.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9683 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-96832024-03-28T09:05:37Z A fast Anderson-Chebyshev acceleration for nonlinear optimization LI, Zhize LI, Jian Anderson acceleration (or Anderson mixing) is an efficient acceleration method for fixed point iterations $x_{t+1}=G(x_t)$, e.g., gradient descent can be viewed as iteratively applying the operation $G(x) \triangleq x-\alpha\nabla f(x)$. It is known that Anderson acceleration is quite efficient in practice and can be viewed as an extension of Krylov subspace methods for nonlinear problems. In this paper, we show that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate $O(\sqrt{\kappa}\ln\frac{1}{\epsilon})$, which improves the previous result $O(\kappa\ln\frac{1}{\epsilon})$ provided by (Toth and Kelley, 2015) for quadratic functions. Moreover, we provide a convergence analysis for minimizing general nonlinear problems. Besides, if the hyperparameters (e.g., the Lipschitz smooth parameter $L$) are not available, we propose a guessing algorithm for guessing them dynamically and also prove a similar convergence rate. Finally, the experimental results demonstrate that the proposed Anderson-Chebyshev acceleration method converges significantly faster than other algorithms, e.g., vanilla gradient descent (GD), Nesterov's Accelerated GD. Also, these algorithms combined with the proposed guessing algorithm (guessing the hyperparameters dynamically) achieve much better performance. 2020-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8680 https://ink.library.smu.edu.sg/context/sis_research/article/9683/viewcontent/AISTATS20_full_anderson.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems |
spellingShingle |
Databases and Information Systems LI, Zhize LI, Jian A fast Anderson-Chebyshev acceleration for nonlinear optimization |
description |
Anderson acceleration (or Anderson mixing) is an efficient acceleration method for fixed point iterations $x_{t+1}=G(x_t)$, e.g., gradient descent can be viewed as iteratively applying the operation $G(x) \triangleq x-\alpha\nabla f(x)$. It is known that Anderson acceleration is quite efficient in practice and can be viewed as an extension of Krylov subspace methods for nonlinear problems. In this paper, we show that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate $O(\sqrt{\kappa}\ln\frac{1}{\epsilon})$, which improves the previous result $O(\kappa\ln\frac{1}{\epsilon})$ provided by (Toth and Kelley, 2015) for quadratic functions. Moreover, we provide a convergence analysis for minimizing general nonlinear problems. Besides, if the hyperparameters (e.g., the Lipschitz smooth parameter $L$) are not available, we propose a guessing algorithm for guessing them dynamically and also prove a similar convergence rate. Finally, the experimental results demonstrate that the proposed Anderson-Chebyshev acceleration method converges significantly faster than other algorithms, e.g., vanilla gradient descent (GD), Nesterov's Accelerated GD. Also, these algorithms combined with the proposed guessing algorithm (guessing the hyperparameters dynamically) achieve much better performance. |
format |
text |
author |
LI, Zhize LI, Jian |
author_facet |
LI, Zhize LI, Jian |
author_sort |
LI, Zhize |
title |
A fast Anderson-Chebyshev acceleration for nonlinear optimization |
title_short |
A fast Anderson-Chebyshev acceleration for nonlinear optimization |
title_full |
A fast Anderson-Chebyshev acceleration for nonlinear optimization |
title_fullStr |
A fast Anderson-Chebyshev acceleration for nonlinear optimization |
title_full_unstemmed |
A fast Anderson-Chebyshev acceleration for nonlinear optimization |
title_sort |
fast anderson-chebyshev acceleration for nonlinear optimization |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2020 |
url |
https://ink.library.smu.edu.sg/sis_research/8680 https://ink.library.smu.edu.sg/context/sis_research/article/9683/viewcontent/AISTATS20_full_anderson.pdf |
_version_ |
1795302170417954816 |