FLUX.1-schnell-mflux-v0.6.2-4bit

Hugging Face

comparison_output

A 4-bit quantized version of the FLUX.1-schnell text-to-image model from Black Forest Labs, implemented using the mflux (version 0.6.2) quantization approach.

Overview

This repository contains a 4-bit quantized version of the FLUX.1-schnell model, which significantly reduces the memory footprint while maintaining most of the generation quality. The quantization was performed using the mflux methodology (v0.6.2).

Original Model

FLUX.1-schnell is a lightweight text-to-image diffusion model developed by Black Forest Labs. It's designed to be faster and more efficient than many larger models while still producing high-quality images.

Benefits of 4-bit Quantization

  • Reduced Memory Usage: ~85% reduction in memory requirements compared to the original model
  • Faster Loading Times: Smaller model size means quicker initialization
  • Lower Storage Requirements: Significantly smaller disk footprint
  • Accessibility: Can run on consumer hardware with limited VRAM

Model Structure

This repository contains the following components:

  • text_encoder/: CLIP text encoder (4-bit quantized)
  • text_encoder_2/: Secondary text encoder (4-bit quantized)
  • tokenizer/: CLIP tokenizer configuration and vocabulary
  • tokenizer_2/: Secondary tokenizer configuration
  • transformer/: Main diffusion model components (4-bit quantized)
  • vae/: Variational autoencoder for image encoding/decoding (4-bit quantized)

Usage

Requirements

  • Python
  • PyTorch
  • Transformers
  • Diffusers
  • mflux library (for 4-bit model support)

Installation

pip install torch diffusers transformers accelerate
uv tool install mflux # check mflux README for more details

Example Usage

# export path for mflux
% mflux-generate \        
    --path "dhairyashil/FLUX.1-schnell-mflux-v0.6.2-4bit" \        
    --model schnell \                                                                           
    --steps 2 \                                                                                 
    --seed 2 \       
    --height 1920 \
    --width 1024 \
    --prompt "hot chocolate dish"

Comparison Output

The images generated from above prompt for different models are shown at the top.

fp16 and 8-bit results are visibly look almost the same but 4-bit result looks a little deviated.

8-bit model is available for comparison.

Performance Comparison

Model Version Memory Usage Inference Speed Quality
Original FP16 ~57 GB Base Base
4-bit Quantized ~9 GB Slightly slower Slightly reduced

Limitations

  • Minor quality degradation compared to the original model
  • Slightly slower inference speed
  • May exhibit occasional artifacts not present in the original model

Acknowledgements

  • Black Forest Labs for creating the original FLUX.1-schnell model
  • Filip Strand for developing the mflux quantization methodology
  • The Hugging Face team for their Diffusers and Transformers libraries

License

This model inherits the license of the original FLUX.1-schnell model. Please refer to the original model repository for licensing information.

Downloads last month
17
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.