Zen5 Max

Top tier of the Zen5 family. The full-Pro base, asymmetrically quantized (routed IQ2_XXS up/gate, Q2_K down; shared experts, attention projections, routing logits and the LM head left at higher precision).

Use when you have 512 GB+ unified memory (Mac Studio M3 Ultra 512 GB) or an 8x H100 / H200 pool and want the deepest reasoning quality in the family. For 128 GB hardware, use zenlm/zen-5-pro-gguf instead.

Part of the canonical Zen5 ladder:

SKU	Hardware fit	This repo
`zen5-flash`	anything	zen-5-flash-gguf
`zen5-mini`	32 GB	zen-5-mini-gguf
`zen5` (default)	24 GB+ VRAM	zen-5-gguf
`zen5-pro`	128 GB single-machine	zen-5-pro-gguf
`zen5-max`	512 GB Mac Studio / 8x H100	← you are here

Files

File pattern	Size	Quant
main GGUF (`-IQ2XXS-w2Q2K--Instruct-imatrix.gguf`)	432 GB	routed `IQ2_XXS` + `Q2_K`, shared `Q8_0`, attn `Q8_0`, imatrix-tuned

Run

Hosted via the Hanzo gateway (api.hanzo.ai) as zen5-max.

Local with the zen5-engine:

git clone https://github.com/zenlm/zen5-engine
cd zen5-engine && make                  # macOS Metal
                       # or: make cuda-generic for multi-H100

hf download zenlm/zen-5-max-gguf --local-dir gguf
ln -sf "$(ls gguf/*-Instruct-imatrix.gguf | head -1)" zen5max.gguf
./zen5 -m zen5max.gguf -p "Explain MoE inference."
./zen5-server -m zen5max.gguf --ctx 1000000 --kv-disk-dir /tmp/zen5-kv --kv-disk-space-mb 16384

Acknowledgements

Built on deepseek-ai/DeepSeek-V4-Pro. The asymmetric routed-MoE quantization scheme, GGUF layout, imatrix calibration, and inference engine all come from Salvatore Sanfilippo's antirez/ds4 project. MIT-licensed; both antirez/ds4 and ggml-org/llama.cpp copyrights are preserved in the zen5-engine LICENSE file.

Downloads last month: 28

GGUF

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for zenlm/zen-5-max-gguf

Base model

deepseek-ai/DeepSeek-V4-Pro

Quantized

(12)

this model

Collection including zenlm/zen-5-max-gguf

Zen5 Chat Ladder

Collection

Canonical Zen5 lineup, smallest to largest. • 6 items • Updated about 12 hours ago