Instructions to use ambassadia/HiDream-O1-Image-Dev-mlx-q4-selective with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use ambassadia/HiDream-O1-Image-Dev-mlx-q4-selective with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir HiDream-O1-Image-Dev-mlx-q4-selective ambassadia/HiDream-O1-Image-Dev-mlx-q4-selective
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
HiDream-O1-Image-Dev โ MLX Q4 selective
A Q4 quantization that preserves the brightness of HiDream-O1-Image-Dev on Apple Silicon, by keeping the first 2 and last 2 decoder layers in BF16. Fits a 16 GB Mac.
The "pure" Q4 conversion of HiDream-O1 has a known bug : per-group dequantization rounding compounds across the 36 decoder layers and collapses image brightness (cf. upstream MLX port report). This repo demonstrates that keeping just 4 of the 36 decoder layers in BF16 is sufficient to break the compounding error and preserve fidelity, for a +1 GB cost.
Benchmarks (M-class Apple Silicon, 2048ร2048, 28 steps, seed=42, identical prompt)
| BF16 | Pure Q4 | Q4-selective | |
|---|---|---|---|
| Disk size | 17.5 GB | 5.7 GB | 6.7 GB |
| Peak RAM | 17.5 GB | 6.5 GB | 7.5 GB |
| Wall time | 115.4 s | 117.9 s | 117.7 s |
| Per step | 4.11 s | 4.21 s | 4.20 s |
| Brightness | โ | โ collapsed (dark/blue) | โ preserved (sunrise) |
| 32-pixel grid in flat regions | absent | present | absent |
| Mac 16 GB compatible | โ | โ but broken | โ + clean |
Speed is not improved over pure Q4 (decoding is bandwidth-bound, not compute-bound), but brightness is fully preserved at near-Q4 RAM cost.
Install + run
mkdir hidream-q4-sel && cd hidream-q4-sel
uv venv --python 3.11 .venv
uv pip install --python .venv/bin/python \
mlx>=0.31.2 mlx-vlm>=0.5.0 transformers>=4.57.0,<6.0 \
huggingface_hub safetensors>=0.6 numpy>=2.0 pillow tqdm sentencepiece
# Download this repo
.venv/bin/python -c "from huggingface_hub import snapshot_download; \
snapshot_download('ambassadia/HiDream-O1-Image-Dev-mlx-q4-selective', local_dir='.')"
# Pull the generation script from the upstream BF16 repo
.venv/bin/python -c "from huggingface_hub import hf_hub_download; \
hf_hub_download('mlx-community/HiDream-O1-Image-Dev-mlx-bf16', \
'scripts/hidream_o1/generate_hidream_o1_mlx.py', local_dir='.')"
# Generate
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path . \
--prompt "your prompt here" \
--output out.png \
--seed 42
Sample output
The included sample_outputs/lake_q4_sel.png was generated from prompt "a serene mountain lake at sunrise, mist rising off the water, photographic, Kodak Tri-X 400" with seed=42, 2048ร2048, on this Q4-selective port.
Reproduce the conversion
The quantize_selective.py script in this repo re-quantizes from a local BF16 MLX export. Useful if you want to tune which layers stay BF16 (e.g. 0,1,2,3,32,33,34,35 for an even cleaner result at +2 GB, or 0,17,34 for cheaper).
# Starting from mlx-community/HiDream-O1-Image-Dev-mlx-bf16 already cloned in ./bf16-source
python quantize_selective.py \
--bf16-source ./bf16-source \
--out-dir ./q4-selective \
--keep-bf16-layers 0,1,34,35
Why selective quantization works here
The HiDream-O1 decoder has 36 transformer layers. When every layer is quantized to Q4 with group_size=64, the per-group rounding error compounds linearly through the residual stream. The brightness distribution of the model's intermediate hidden state drifts toward zero, manifesting as dark/moody outputs.
By keeping the first 2 and last 2 decoder layers in BF16 :
- Layer 0-1 (input layers) see the unquantized text embeddings + image patches and produce clean early features
- Layer 34-35 (output layers) project the final hidden state to the patch-prediction space without rounding loss
- The 32 middle layers can absorb the Q4 noise without bleeding it into the input/output boundary
This approach is documented and reproducible โ the upstream MLX port author flagged the brightness collapse but rejected Q4 outright. Selective quantization recovers Q4 viability.
License
MIT, matching upstream HiDream-ai/HiDream-O1-Image-Dev.
Provenance
- Base model : HiDream-ai/HiDream-O1-Image-Dev (MIT)
- BF16 MLX port : mlx-community/HiDream-O1-Image-Dev-mlx-bf16 (MIT)
- Q4-selective port : this repo, by Olivier Dupont (HF: ambassadia), 2026-05-10
- Downloads last month
- 187
Quantized
Model tree for ambassadia/HiDream-O1-Image-Dev-mlx-q4-selective
Base model
HiDream-ai/HiDream-O1-Image-Dev