Post
2083
It's been a while we shipped native quantization support in
We currently support
This post is just a reminder of what's possible:
1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4.
5. Training and loading LoRAs into quantized checkpoints
Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes
diffusers
🧨We currently support
bistandbytes
as the official backend but using others like torchao
is already very simple. This post is just a reminder of what's possible:
1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4.
enable_model_cpu_offload()
5. Training and loading LoRAs into quantized checkpoints
Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes