ScaleLong: Towards more stable training of diffusion model via scaling network long skip connection

In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be all...

Full description

Saved in:

Bibliographic Details
Main Authors:	HUANG, Zhongzhan, ZHOU, Pan, YAN, Shuicheng, LIN, Liang
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	OS and Networks
Online Access:	https://ink.library.smu.edu.sg/sis_research/9025 https://ink.library.smu.edu.sg/context/sis_research/article/10028/viewcontent/2023_NeurIPS_scalelong.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10028
record_format	dspace
spelling	sg-smu-ink.sis_research-100282024-07-25T08:03:59Z ScaleLong: Towards more stable training of diffusion model via scaling network long skip connection HUANG, Zhongzhan ZHOU, Pan YAN, Shuicheng LIN, Liang In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improve the training stability of UNet. Experimental results on four famous datasets show that our methods are superior to stabilize training, and yield about 1.5× training acceleration on different diffusion models with UNet or UViT backbones. 2023-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9025 https://ink.library.smu.edu.sg/context/sis_research/article/10028/viewcontent/2023_NeurIPS_scalelong.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University OS and Networks
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	OS and Networks
spellingShingle	OS and Networks HUANG, Zhongzhan ZHOU, Pan YAN, Shuicheng LIN, Liang ScaleLong: Towards more stable training of diffusion model via scaling network long skip connection
description	In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improve the training stability of UNet. Experimental results on four famous datasets show that our methods are superior to stabilize training, and yield about 1.5× training acceleration on different diffusion models with UNet or UViT backbones.
format	text
author	HUANG, Zhongzhan ZHOU, Pan YAN, Shuicheng LIN, Liang
author_facet	HUANG, Zhongzhan ZHOU, Pan YAN, Shuicheng LIN, Liang
author_sort	HUANG, Zhongzhan
title	ScaleLong: Towards more stable training of diffusion model via scaling network long skip connection
title_short	ScaleLong: Towards more stable training of diffusion model via scaling network long skip connection
title_full	ScaleLong: Towards more stable training of diffusion model via scaling network long skip connection
title_fullStr	ScaleLong: Towards more stable training of diffusion model via scaling network long skip connection
title_full_unstemmed	ScaleLong: Towards more stable training of diffusion model via scaling network long skip connection
title_sort	scalelong: towards more stable training of diffusion model via scaling network long skip connection
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/9025 https://ink.library.smu.edu.sg/context/sis_research/article/10028/viewcontent/2023_NeurIPS_scalelong.pdf
_version_	1814047711283904512

ScaleLong: Towards more stable training of diffusion model via scaling network long skip connection

Similar Items