Zen5 Max
Top tier of the Zen5 family. The full-Pro base, asymmetrically quantized (routed IQ2_XXS up/gate, Q2_K down; shared experts, attention projections, routing logits and the LM head left at higher precision).
Use when you have 512 GB+ unified memory (Mac Studio M3 Ultra 512 GB) or an 8x H100 / H200 pool and want the deepest reasoning quality in the family. For 128 GB hardware, use zenlm/zen-5-pro-gguf instead.
Part of the canonical Zen5 ladder:
| SKU | Hardware fit | This repo |
|---|---|---|
zen5-flash |
anything | zen-5-flash-gguf |
zen5-mini |
32 GB | zen-5-mini-gguf |
zen5 (default) |
24 GB+ VRAM | zen-5-gguf |
zen5-pro |
128 GB single-machine | zen-5-pro-gguf |
zen5-max |
512 GB Mac Studio / 8x H100 | ← you are here |
Files
| File pattern | Size | Quant |
|---|---|---|
main GGUF (*-IQ2XXS-w2Q2K-*-Instruct-imatrix.gguf) |
432 GB | routed IQ2_XXS + Q2_K, shared Q8_0, attn Q8_0, imatrix-tuned |
Run
Hosted via the Hanzo gateway (api.hanzo.ai) as zen5-max.
Local with the zen5-engine:
git clone https://github.com/zenlm/zen5-engine
cd zen5-engine && make # macOS Metal
# or: make cuda-generic for multi-H100
hf download zenlm/zen-5-max-gguf --local-dir gguf
ln -sf "$(ls gguf/*-Instruct-imatrix.gguf | head -1)" zen5max.gguf
./zen5 -m zen5max.gguf -p "Explain MoE inference."
./zen5-server -m zen5max.gguf --ctx 1000000 --kv-disk-dir /tmp/zen5-kv --kv-disk-space-mb 16384
Acknowledgements
Built on deepseek-ai/DeepSeek-V4-Pro. The asymmetric routed-MoE quantization scheme, GGUF layout, imatrix calibration, and inference engine all come from Salvatore Sanfilippo's antirez/ds4 project. MIT-licensed; both antirez/ds4 and ggml-org/llama.cpp copyrights are preserved in the zen5-engine LICENSE file.
- Downloads last month
- 28
We're not able to determine the quantization variants.
Model tree for zenlm/zen-5-max-gguf
Base model
deepseek-ai/DeepSeek-V4-Pro