HiDream-O1-Image-Dev — MLX Q4 selective

A Q4 quantization that preserves the brightness of HiDream-O1-Image-Dev on Apple Silicon, by keeping the first 2 and last 2 decoder layers in BF16. Fits a 16 GB Mac.

The "pure" Q4 conversion of HiDream-O1 has a known bug : per-group dequantization rounding compounds across the 36 decoder layers and collapses image brightness (cf. upstream MLX port report). This repo demonstrates that keeping just 4 of the 36 decoder layers in BF16 is sufficient to break the compounding error and preserve fidelity, for a +1 GB cost.

Benchmarks (M-class Apple Silicon, 2048×2048, 28 steps, seed=42, identical prompt)

	BF16	Pure Q4	Q4-selective
Disk size	17.5 GB	5.7 GB	6.7 GB
Peak RAM	17.5 GB	6.5 GB	7.5 GB
Wall time	115.4 s	117.9 s	117.7 s
Per step	4.11 s	4.21 s	4.20 s
Brightness	✓	✗ collapsed (dark/blue)	✓ preserved (sunrise)
32-pixel grid in flat regions	absent	present	absent
Mac 16 GB compatible	❌	✓ but broken	✓ + clean

Speed is not improved over pure Q4 (decoding is bandwidth-bound, not compute-bound), but brightness is fully preserved at near-Q4 RAM cost.

Install + run

mkdir hidream-q4-sel && cd hidream-q4-sel
uv venv --python 3.11 .venv
uv pip install --python .venv/bin/python \
  mlx>=0.31.2 mlx-vlm>=0.5.0 transformers>=4.57.0,<6.0 \
  huggingface_hub safetensors>=0.6 numpy>=2.0 pillow tqdm sentencepiece

# Download this repo
.venv/bin/python -c "from huggingface_hub import snapshot_download; \
  snapshot_download('ambassadia/HiDream-O1-Image-Dev-mlx-q4-selective', local_dir='.')"

# Pull the generation script from the upstream BF16 repo
.venv/bin/python -c "from huggingface_hub import hf_hub_download; \
  hf_hub_download('mlx-community/HiDream-O1-Image-Dev-mlx-bf16', \
                  'scripts/hidream_o1/generate_hidream_o1_mlx.py', local_dir='.')"

# Generate
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
  --model-path . \
  --prompt "your prompt here" \
  --output out.png \
  --seed 42

Sample output

The included sample_outputs/lake_q4_sel.png was generated from prompt "a serene mountain lake at sunrise, mist rising off the water, photographic, Kodak Tri-X 400" with seed=42, 2048×2048, on this Q4-selective port.

Reproduce the conversion

The quantize_selective.py script in this repo re-quantizes from a local BF16 MLX export. Useful if you want to tune which layers stay BF16 (e.g. 0,1,2,3,32,33,34,35 for an even cleaner result at +2 GB, or 0,17,34 for cheaper).

# Starting from mlx-community/HiDream-O1-Image-Dev-mlx-bf16 already cloned in ./bf16-source
python quantize_selective.py \
  --bf16-source ./bf16-source \
  --out-dir ./q4-selective \
  --keep-bf16-layers 0,1,34,35

Why selective quantization works here

The HiDream-O1 decoder has 36 transformer layers. When every layer is quantized to Q4 with group_size=64, the per-group rounding error compounds linearly through the residual stream. The brightness distribution of the model's intermediate hidden state drifts toward zero, manifesting as dark/moody outputs.

By keeping the first 2 and last 2 decoder layers in BF16 :

Layer 0-1 (input layers) see the unquantized text embeddings + image patches and produce clean early features
Layer 34-35 (output layers) project the final hidden state to the patch-prediction space without rounding loss
The 32 middle layers can absorb the Q4 noise without bleeding it into the input/output boundary

This approach is documented and reproducible — the upstream MLX port author flagged the brightness collapse but rejected Q4 outright. Selective quantization recovers Q4 viability.

License

MIT, matching upstream HiDream-ai/HiDream-O1-Image-Dev.

Provenance

Base model : HiDream-ai/HiDream-O1-Image-Dev (MIT)
BF16 MLX port : mlx-community/HiDream-O1-Image-Dev-mlx-bf16 (MIT)
Q4-selective port : this repo, by Olivier Dupont (HF: ambassadia), 2026-05-10

Downloads last month: 187

MLX

Hardware compatibility

Quantized

Model tree for ambassadia/HiDream-O1-Image-Dev-mlx-q4-selective

Base model

HiDream-ai/HiDream-O1-Image-Dev

Finetuned

(5)

this model