BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression
Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentral...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2022
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8687 https://ink.library.smu.edu.sg/context/sis_research/article/9690/viewcontent/NeurIPS22_full_beer.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9690 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-96902024-03-28T08:46:03Z BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression ZHAO, Haoyu LI, Boyue LI, Zhize RICHTARIK, Peter CHI, Yuejie Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence $O((G/T)^{2/3})$ compared with their uncompressed counterpart, where $G$ measures the data heterogeneity across different clients, and $T$ is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of $O(1/T)$. This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of beer in the data heterogeneous regime. 2022-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8687 https://ink.library.smu.edu.sg/context/sis_research/article/9690/viewcontent/NeurIPS22_full_beer.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems |
spellingShingle |
Databases and Information Systems ZHAO, Haoyu LI, Boyue LI, Zhize RICHTARIK, Peter CHI, Yuejie BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression |
description |
Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence $O((G/T)^{2/3})$ compared with their uncompressed counterpart, where $G$ measures the data heterogeneity across different clients, and $T$ is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of $O(1/T)$. This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of beer in the data heterogeneous regime. |
format |
text |
author |
ZHAO, Haoyu LI, Boyue LI, Zhize RICHTARIK, Peter CHI, Yuejie |
author_facet |
ZHAO, Haoyu LI, Boyue LI, Zhize RICHTARIK, Peter CHI, Yuejie |
author_sort |
ZHAO, Haoyu |
title |
BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression |
title_short |
BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression |
title_full |
BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression |
title_fullStr |
BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression |
title_full_unstemmed |
BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression |
title_sort |
beer: fast o(1/t) rate for decentralized nonconvex optimization with communication compression |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2022 |
url |
https://ink.library.smu.edu.sg/sis_research/8687 https://ink.library.smu.edu.sg/context/sis_research/article/9690/viewcontent/NeurIPS22_full_beer.pdf |
_version_ |
1795302173007937536 |