HiDream-O1-Image-Dev โ€” MLX Q4 selective

A Q4 quantization that preserves the brightness of HiDream-O1-Image-Dev on Apple Silicon, by keeping the first 2 and last 2 decoder layers in BF16. Fits a 16 GB Mac.

The "pure" Q4 conversion of HiDream-O1 has a known bug : per-group dequantization rounding compounds across the 36 decoder layers and collapses image brightness (cf. upstream MLX port report). This repo demonstrates that keeping just 4 of the 36 decoder layers in BF16 is sufficient to break the compounding error and preserve fidelity, for a +1 GB cost.

Benchmarks (M-class Apple Silicon, 2048ร—2048, 28 steps, seed=42, identical prompt)

BF16 Pure Q4 Q4-selective
Disk size 17.5 GB 5.7 GB 6.7 GB
Peak RAM 17.5 GB 6.5 GB 7.5 GB
Wall time 115.4 s 117.9 s 117.7 s
Per step 4.11 s 4.21 s 4.20 s
Brightness โœ“ โœ— collapsed (dark/blue) โœ“ preserved (sunrise)
32-pixel grid in flat regions absent present absent
Mac 16 GB compatible โŒ โœ“ but broken โœ“ + clean

Speed is not improved over pure Q4 (decoding is bandwidth-bound, not compute-bound), but brightness is fully preserved at near-Q4 RAM cost.

Install + run

mkdir hidream-q4-sel && cd hidream-q4-sel
uv venv --python 3.11 .venv
uv pip install --python .venv/bin/python \
  mlx>=0.31.2 mlx-vlm>=0.5.0 transformers>=4.57.0,<6.0 \
  huggingface_hub safetensors>=0.6 numpy>=2.0 pillow tqdm sentencepiece

# Download this repo
.venv/bin/python -c "from huggingface_hub import snapshot_download; \
  snapshot_download('ambassadia/HiDream-O1-Image-Dev-mlx-q4-selective', local_dir='.')"

# Pull the generation script from the upstream BF16 repo
.venv/bin/python -c "from huggingface_hub import hf_hub_download; \
  hf_hub_download('mlx-community/HiDream-O1-Image-Dev-mlx-bf16', \
                  'scripts/hidream_o1/generate_hidream_o1_mlx.py', local_dir='.')"

# Generate
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
  --model-path . \
  --prompt "your prompt here" \
  --output out.png \
  --seed 42

Sample output

The included sample_outputs/lake_q4_sel.png was generated from prompt "a serene mountain lake at sunrise, mist rising off the water, photographic, Kodak Tri-X 400" with seed=42, 2048ร—2048, on this Q4-selective port.

Reproduce the conversion

The quantize_selective.py script in this repo re-quantizes from a local BF16 MLX export. Useful if you want to tune which layers stay BF16 (e.g. 0,1,2,3,32,33,34,35 for an even cleaner result at +2 GB, or 0,17,34 for cheaper).

# Starting from mlx-community/HiDream-O1-Image-Dev-mlx-bf16 already cloned in ./bf16-source
python quantize_selective.py \
  --bf16-source ./bf16-source \
  --out-dir ./q4-selective \
  --keep-bf16-layers 0,1,34,35

Why selective quantization works here

The HiDream-O1 decoder has 36 transformer layers. When every layer is quantized to Q4 with group_size=64, the per-group rounding error compounds linearly through the residual stream. The brightness distribution of the model's intermediate hidden state drifts toward zero, manifesting as dark/moody outputs.

By keeping the first 2 and last 2 decoder layers in BF16 :

  • Layer 0-1 (input layers) see the unquantized text embeddings + image patches and produce clean early features
  • Layer 34-35 (output layers) project the final hidden state to the patch-prediction space without rounding loss
  • The 32 middle layers can absorb the Q4 noise without bleeding it into the input/output boundary

This approach is documented and reproducible โ€” the upstream MLX port author flagged the brightness collapse but rejected Q4 outright. Selective quantization recovers Q4 viability.

License

MIT, matching upstream HiDream-ai/HiDream-O1-Image-Dev.

Provenance

Downloads last month
187
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ambassadia/HiDream-O1-Image-Dev-mlx-q4-selective

Finetuned
(5)
this model