Lmxyy commited on
Commit
7537dc6
·
verified ·
1 Parent(s): eb01adc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -2
README.md CHANGED
@@ -30,14 +30,31 @@ library_name: diffusers
30
  <a href='https://hanlab.mit.edu/projects/svdquant'>[Website]</a>&ensp;
31
  <a href='https://hanlab.mit.edu/blog/svdquant'>[Blog]</a>
32
  </div>
33
-
34
  ![teaser](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/teaser.jpg)
35
  SVDQuant is a post-training quantization technique for 4-bit weights and activations that well maintains visual fidelity. On 12B FLUX.1-dev, it achieves 3.6× memory reduction compared to the BF16 model. By eliminating CPU offloading, it offers 8.7× speedup over the 16-bit model when on a 16GB laptop 4090 GPU, 3× faster than the NF4 W4A16 baseline. On PixArt-∑, it demonstrates significantly superior visual quality over other W4A4 or even W4A8 baselines. "E2E" means the end-to-end latency including the text encoder and VAE decoder.
36
 
37
  ## Method
38
  #### Quantization Method -- SVDQuant
39
 
40
- ![intuition](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/intuition.gif)Overview of SVDQuant. Stage1: Originally, both the activation $\boldsymbol{X}$ and weights $\boldsymbol{W}$ contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation $\hat{\boldsymbol{X}}$ and weights $\hat{\boldsymbol{W}}$. While $\hat{\boldsymbol{X}}$ becomes easier to quantize, $\hat{\boldsymbol{W}}$ now becomes more difficult. Stage 3: SVDQuant further decomposes $\hat{\boldsymbol{W}}$ into a low-rank component $\boldsymbol{L}_1\boldsymbol{L}_2$ and a residual $\hat{\boldsymbol{W}}-\boldsymbol{L}_1\boldsymbol{L}_2$ with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  #### Nunchaku Engine Design
43
 
 
30
  <a href='https://hanlab.mit.edu/projects/svdquant'>[Website]</a>&ensp;
31
  <a href='https://hanlab.mit.edu/blog/svdquant'>[Blog]</a>
32
  </div>
 
33
  ![teaser](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/teaser.jpg)
34
  SVDQuant is a post-training quantization technique for 4-bit weights and activations that well maintains visual fidelity. On 12B FLUX.1-dev, it achieves 3.6× memory reduction compared to the BF16 model. By eliminating CPU offloading, it offers 8.7× speedup over the 16-bit model when on a 16GB laptop 4090 GPU, 3× faster than the NF4 W4A16 baseline. On PixArt-∑, it demonstrates significantly superior visual quality over other W4A4 or even W4A8 baselines. "E2E" means the end-to-end latency including the text encoder and VAE decoder.
35
 
36
  ## Method
37
  #### Quantization Method -- SVDQuant
38
 
39
+ ![intuition](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/intuition.gif)
40
+ <!DOCTYPE html>
41
+ <html lang="en">
42
+ <head>
43
+ <meta charset="UTF-8">
44
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
45
+ <title>LaTeX Rendering Example</title>
46
+ <script type="text/javascript" async
47
+ src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js">
48
+ </script>
49
+ </head>
50
+ <body>
51
+ <p>
52
+ The key idea behind SVDQuant is to introduce an additional low-rank branch that can absorb quantization difficulties in both weights and activations. As shown in the above animation, originally, both the activation \( \boldsymbol{X} \) and weights \( \boldsymbol{W} \) contain massive outliers, making 4-bit quantization challenging. We can first aggregate the outliers by migrating them from activations to weights via smoothing, resulting in the updated activation \( \hat{\boldsymbol{X}} \) and weights \( \hat{\boldsymbol{W}} \). While \( \hat{\boldsymbol{X}} \) becomes easier to quantize, \( \hat{\boldsymbol{W}} \) now becomes more difficult. At the last stage, SVDQuant further decomposes \( \hat{\boldsymbol{W}} \) into a low-rank component \( \boldsymbol{L}_1 \boldsymbol{L}_2 \) and a residual \( \hat{\boldsymbol{W}} - \boldsymbol{L}_1 \boldsymbol{L}_2 \) with Singular Value Decomposition (SVD). As the singular value distribution of \( \hat{\boldsymbol{W}} \) is highly imbalanced, with only the first several values being significantly larger, removing these dominant values can dramatically reduce \( \hat{\boldsymbol{W}} \)’s magnitude and outliers, as suggested by <a href='https://en.wikipedia.org/wiki/Low-rank_approximation'>Eckart-Young-Mirsky theorem</a>. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision. The below figure illustrates an example value distribution of the input activations and weights in PixArt-∑.
53
+ </p>
54
+ </body>
55
+ </html>
56
+
57
+ Overview of SVDQuant. Stage1: Originally, both the activation $\boldsymbol{X}$ and weights $\boldsymbol{W}$ contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation $\hat{\boldsymbol{X}}$ and weights $\hat{\boldsymbol{W}}$. While $\hat{\boldsymbol{X}}$ becomes easier to quantize, $\hat{\boldsymbol{W}}$ now becomes more difficult. Stage 3: SVDQuant further decomposes $\hat{\boldsymbol{W}}$ into a low-rank component $\boldsymbol{L}_1\boldsymbol{L}_2$ and a residual $\hat{\boldsymbol{W}}-\boldsymbol{L}_1\boldsymbol{L}_2$ with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision.
58
 
59
  #### Nunchaku Engine Design
60