wiki/Model-Compression-with-NNCF.md · bilegentile/test at main

Usage

Use Diffusers backend. Execution & Models -> Execution backend
Go into Compute Settings
Enable Compress Model weights with NNCF options
Restart the WebUI if it's your first time using NNCF. Otherwise, just reload the model.

These results compares NNCF 8 bit to 16 bit.

Model:
Compresses UNet or Transformers part of the model.
This is where the most memory savings happens for Stable Diffusion.

SDXL: 2500 MB~ memory savings.
SD 1.5: 750 MB~ memory savings.
PixArt-XL-2: 600 MB~ memory savings.
Text Encoder:
Compresses Text Encoder parts of the model.
This is where the most memory savings happens for PixArt.

PixArt-XL-2: 4750 MB~ memory savings.
SDXL: 750 MB~ memory savings.
SD 1.5: 120 MB~ memory savings.
VAE:
Compresses VAE part of the model.
Memory savings from compressing VAE is pretty small.

SD 1.5 / SDXL / PixArt-XL-2: 75 MB~ memory savings.
4 Bit Compression and Quantization:
4 bit compression modes and quantization can be used with OpenVINO backend.
For more info: https://github.com/vladmandic/automatic/wiki/OpenVINO#quantization