BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression

Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentral...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHAO, Haoyu, LI, Boyue, LI, Zhize, RICHTARIK, Peter, CHI, Yuejie
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/8687 https://ink.library.smu.edu.sg/context/sis_research/article/9690/viewcontent/NeurIPS22_full_beer.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9690
record_format	dspace
spelling	sg-smu-ink.sis_research-96902024-03-28T08:46:03Z BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression ZHAO, Haoyu LI, Boyue LI, Zhize RICHTARIK, Peter CHI, Yuejie Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence $O((G/T)^{2/3})$ compared with their uncompressed counterpart, where $G$ measures the data heterogeneity across different clients, and $T$ is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of $O(1/T)$. This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of beer in the data heterogeneous regime. 2022-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8687 https://ink.library.smu.edu.sg/context/sis_research/article/9690/viewcontent/NeurIPS22_full_beer.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Databases and Information Systems
spellingShingle	Databases and Information Systems ZHAO, Haoyu LI, Boyue LI, Zhize RICHTARIK, Peter CHI, Yuejie BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression
description	Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence $O((G/T)^{2/3})$ compared with their uncompressed counterpart, where $G$ measures the data heterogeneity across different clients, and $T$ is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of $O(1/T)$. This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of beer in the data heterogeneous regime.
format	text
author	ZHAO, Haoyu LI, Boyue LI, Zhize RICHTARIK, Peter CHI, Yuejie
author_facet	ZHAO, Haoyu LI, Boyue LI, Zhize RICHTARIK, Peter CHI, Yuejie
author_sort	ZHAO, Haoyu
title	BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression
title_short	BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression
title_full	BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression
title_fullStr	BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression
title_full_unstemmed	BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression
title_sort	beer: fast o(1/t) rate for decentralized nonconvex optimization with communication compression
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/8687 https://ink.library.smu.edu.sg/context/sis_research/article/9690/viewcontent/NeurIPS22_full_beer.pdf
_version_	1795302173007937536

BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression

Similar Items