BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression

Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentral...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHAO, Haoyu, LI, Boyue, LI, Zhize, RICHTARIK, Peter, CHI, Yuejie
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8687
https://ink.library.smu.edu.sg/context/sis_research/article/9690/viewcontent/NeurIPS22_full_beer.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9690
record_format dspace
spelling sg-smu-ink.sis_research-96902024-03-28T08:46:03Z BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression ZHAO, Haoyu LI, Boyue LI, Zhize RICHTARIK, Peter CHI, Yuejie Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence $O((G/T)^{2/3})$ compared with their uncompressed counterpart, where $G$ measures the data heterogeneity across different clients, and $T$ is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of $O(1/T)$. This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of beer in the data heterogeneous regime. 2022-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8687 https://ink.library.smu.edu.sg/context/sis_research/article/9690/viewcontent/NeurIPS22_full_beer.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
spellingShingle Databases and Information Systems
ZHAO, Haoyu
LI, Boyue
LI, Zhize
RICHTARIK, Peter
CHI, Yuejie
BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression
description Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence $O((G/T)^{2/3})$ compared with their uncompressed counterpart, where $G$ measures the data heterogeneity across different clients, and $T$ is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of $O(1/T)$. This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of beer in the data heterogeneous regime.
format text
author ZHAO, Haoyu
LI, Boyue
LI, Zhize
RICHTARIK, Peter
CHI, Yuejie
author_facet ZHAO, Haoyu
LI, Boyue
LI, Zhize
RICHTARIK, Peter
CHI, Yuejie
author_sort ZHAO, Haoyu
title BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression
title_short BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression
title_full BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression
title_fullStr BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression
title_full_unstemmed BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression
title_sort beer: fast o(1/t) rate for decentralized nonconvex optimization with communication compression
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/8687
https://ink.library.smu.edu.sg/context/sis_research/article/9690/viewcontent/NeurIPS22_full_beer.pdf
_version_ 1795302173007937536