FLUX.1-schnell-mflux-v0.6.2-4bit
A 4-bit quantized version of the FLUX.1-schnell text-to-image model from Black Forest Labs, implemented using the mflux (version 0.6.2) quantization approach.
Overview
This repository contains a 4-bit quantized version of the FLUX.1-schnell model, which significantly reduces the memory footprint while maintaining most of the generation quality. The quantization was performed using the mflux methodology (v0.6.2).
Original Model
FLUX.1-schnell is a lightweight text-to-image diffusion model developed by Black Forest Labs. It's designed to be faster and more efficient than many larger models while still producing high-quality images.
Benefits of 4-bit Quantization
- Reduced Memory Usage: ~85% reduction in memory requirements compared to the original model
- Faster Loading Times: Smaller model size means quicker initialization
- Lower Storage Requirements: Significantly smaller disk footprint
- Accessibility: Can run on consumer hardware with limited VRAM
Model Structure
This repository contains the following components:
text_encoder/
: CLIP text encoder (4-bit quantized)text_encoder_2/
: Secondary text encoder (4-bit quantized)tokenizer/
: CLIP tokenizer configuration and vocabularytokenizer_2/
: Secondary tokenizer configurationtransformer/
: Main diffusion model components (4-bit quantized)vae/
: Variational autoencoder for image encoding/decoding (4-bit quantized)
Usage
Requirements
- Python
- PyTorch
- Transformers
- Diffusers
- mflux library (for 4-bit model support)
Installation
pip install torch diffusers transformers accelerate
uv tool install mflux # check mflux README for more details
Example Usage
# export path for mflux
% mflux-generate \
--path "dhairyashil/FLUX.1-schnell-mflux-v0.6.2-4bit" \
--model schnell \
--steps 2 \
--seed 2 \
--height 1920 \
--width 1024 \
--prompt "hot chocolate dish"
Comparison Output
The images generated from above prompt for different models are shown at the top.
fp16 and 8-bit results are visibly look almost the same but 4-bit result looks a little deviated.
8-bit model is available for comparison.
Performance Comparison
Model Version | Memory Usage | Inference Speed | Quality |
---|---|---|---|
Original FP16 | ~57 GB | Base | Base |
4-bit Quantized | ~9 GB | Slightly slower | Slightly reduced |
Limitations
- Minor quality degradation compared to the original model
- Slightly slower inference speed
- May exhibit occasional artifacts not present in the original model
Acknowledgements
- Black Forest Labs for creating the original FLUX.1-schnell model
- Filip Strand for developing the mflux quantization methodology
- The Hugging Face team for their Diffusers and Transformers libraries
License
This model inherits the license of the original FLUX.1-schnell model. Please refer to the original model repository for licensing information.
- Downloads last month
- 17