Additive quantization for truly tiny compressed diffusion models

Tremendous investments have been made towards the commodification of diffusion models for generation of diverse media. Their mass-market adoption is however still hobbled by the intense hardware resource requirements of diffusion model inference. Model quantization strategies tailored specificall...

Full description

Saved in:

Bibliographic Details
Main Author:	Hasan, Adil
Other Authors:	Thomas Peyrin
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Machine learning
Online Access:	https://hdl.handle.net/10356/181210
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-181210
record_format	dspace
spelling	sg-ntu-dr.10356-1812102024-11-18T05:06:49Z Additive quantization for truly tiny compressed diffusion models Hasan, Adil Thomas Peyrin College of Computing and Data Science thomas.peyrin@ntu.edu.sg Computer and Information Science Machine learning Tremendous investments have been made towards the commodification of diffusion models for generation of diverse media. Their mass-market adoption is however still hobbled by the intense hardware resource requirements of diffusion model inference. Model quantization strategies tailored specifically towards diffusion models have seen considerable success in easing this burden, yet without exception have explored only the Uniform Scalar Quantization (USQ) family of quantization methods. In contrast, Vector Quantization (VQ) methods, which replace groups of multiple related weights with indices into codebooks, have recently taken the parallel field of Large Language Model (LLM) quantization by storm. In this FYP project, we for the first time apply codebook-based additive vector quantization algorithms to the problem of diffusion model compression. We are rewarded with state-of-the-art results on the important class-conditional benchmark of LDM-4 on ImageNet at 20 inference time steps, in- cluding sFID as much as 1.93 points lower than the full-precision model at W4A8, the best-reported results for FID, sFID and ISC at W2A8, and the first-ever successful quantization to W1.5A8 (less than 1.5 bits stored per weight). Furthermore, our pro- posed method allows for a dynamic trade-off between quantization-time GPU hours and inference-time savings, in line with the recent trend of approaches blending the best as- pects of post-training quantization (PTQ) and quantization-aware training (QAT), and demonstrates FLOPs savings on arbitrary hardware via an efficient inference kernel. Bachelor's degree 2024-11-18T05:06:49Z 2024-11-18T05:06:49Z 2024 Final Year Project (FYP) Hasan, A. (2024). Additive quantization for truly tiny compressed diffusion models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181210 https://hdl.handle.net/10356/181210 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Machine learning
spellingShingle	Computer and Information Science Machine learning Hasan, Adil Additive quantization for truly tiny compressed diffusion models
description	Tremendous investments have been made towards the commodification of diffusion models for generation of diverse media. Their mass-market adoption is however still hobbled by the intense hardware resource requirements of diffusion model inference. Model quantization strategies tailored specifically towards diffusion models have seen considerable success in easing this burden, yet without exception have explored only the Uniform Scalar Quantization (USQ) family of quantization methods. In contrast, Vector Quantization (VQ) methods, which replace groups of multiple related weights with indices into codebooks, have recently taken the parallel field of Large Language Model (LLM) quantization by storm. In this FYP project, we for the first time apply codebook-based additive vector quantization algorithms to the problem of diffusion model compression. We are rewarded with state-of-the-art results on the important class-conditional benchmark of LDM-4 on ImageNet at 20 inference time steps, in- cluding sFID as much as 1.93 points lower than the full-precision model at W4A8, the best-reported results for FID, sFID and ISC at W2A8, and the first-ever successful quantization to W1.5A8 (less than 1.5 bits stored per weight). Furthermore, our pro- posed method allows for a dynamic trade-off between quantization-time GPU hours and inference-time savings, in line with the recent trend of approaches blending the best as- pects of post-training quantization (PTQ) and quantization-aware training (QAT), and demonstrates FLOPs savings on arbitrary hardware via an efficient inference kernel.
author2	Thomas Peyrin
author_facet	Thomas Peyrin Hasan, Adil
format	Final Year Project
author	Hasan, Adil
author_sort	Hasan, Adil
title	Additive quantization for truly tiny compressed diffusion models
title_short	Additive quantization for truly tiny compressed diffusion models
title_full	Additive quantization for truly tiny compressed diffusion models
title_fullStr	Additive quantization for truly tiny compressed diffusion models
title_full_unstemmed	Additive quantization for truly tiny compressed diffusion models
title_sort	additive quantization for truly tiny compressed diffusion models
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/181210
_version_	1816859060312473600

Additive quantization for truly tiny compressed diffusion models

Similar Items