Papers
arxiv:2405.19751

HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

Published on May 30, 2024
Authors:
,

Abstract

Diffusion Transformers (DiTs) have recently gained substantial attention in both industrial and academic fields for their superior visual generation capabilities, outperforming traditional diffusion models that use U-Net. However,the enhanced performance of DiTs also comes with high parameter counts and implementation costs, seriously restricting their use on resource-limited devices such as mobile phones. To address these challenges, we introduce the Hybrid Floating-point Quantization for DiT(HQ-DiT), an efficient post-training quantization method that utilizes 4-bit floating-point (FP) precision on both weights and activations for DiT inference. Compared to fixed-point quantization (e.g., INT8), FP quantization, complemented by our proposed clipping range selection mechanism, naturally aligns with the data distribution within DiT, resulting in a minimal quantization error. Furthermore, HQ-DiT also implements a universal identity mathematical transform to mitigate the serious quantization error caused by the outliers. The experimental results demonstrate that DiT can achieve extremely low-precision quantization (i.e., 4 bits) with negligible impact on performance. Our approach marks the first instance where both weights and activations in DiTs are quantized to just 4 bits, with only a 0.12 increase in sFID on ImageNet.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2405.19751 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2405.19751 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2405.19751 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.