DESTRESS: Computation-optimal and communication-efficient decentralized nonconvex finite-sum optimization

Emerging applications in multiagent environments such as internet-of-things, networked sensing, autonomous systems, and federated learning, call for decentralized algorithms for finite-sum optimizations that are resource efficient in terms of both computation and communication. In this paper, we con...

Full description

Saved in:
Bibliographic Details
Main Authors: LI, Boyue, LI, Zhize, CHI, Yuejie
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8691
https://ink.library.smu.edu.sg/context/sis_research/article/9694/viewcontent/SIMODS22_destress.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Emerging applications in multiagent environments such as internet-of-things, networked sensing, autonomous systems, and federated learning, call for decentralized algorithms for finite-sum optimizations that are resource efficient in terms of both computation and communication. In this paper, we consider the prototypical setting where the agents work collaboratively to minimize the sum of local loss functions by only communicating with their neighbors over a predetermined network topology. We develop a new algorithm, called DEcentralized STochastic REcurSive gradient methodS (DESTRESS) for nonconvex finite-sum optimization, which matches the optimal incremental first-order oracle complexity of centralized algorithms for finding first-order stationary points, while maintaining communication efficiency. Detailed theoretical and numerical comparisons corroborate that the resource efficiencies of DESTRESS improve upon prior decentralized algorithms over a wide range of parameter regimes. DESTRESS leverages several key algorithm design ideas including stochastic recursive gradient updates with minibatches for local computation, gradient tracking with extra mixing (i.e., multiple gossiping rounds) for periteration communication, together with careful choices of hyperparameters and new analysis frameworks to provably achieve a desirable computation-communication trade-off.