GLM-5.2-504B-K — knowledge-augmented REAP keep-168 (full-data Router-KD, NVFP4)

The "K-cut" sibling of 0xSero/GLM-5.2-504B: the same 504B / keep-168 budget, but the expert selection is biased toward knowledge & reasoning — the winning top-160 core (kept bit-for-bit) plus the 8 highest-priority knowledge-exclusive experts per layer that coding-saliency pruning drops. Recovered with gate-only **Router-KD trained on the FULL calibration set (18.6k real traces)** — 6x the data of the first-pass cuts.

Sponsor

8x NVIDIA B200 sponsored by Lambda. Thank you.

Why this variant exists

REAP saliency computed from coding traces under-weights experts that fire mainly on reasoning/knowledge. The K-cut deliberately re-includes them — trading a sliver of coding-saliency coverage for broader knowledge coverage. Reach for this on knowledge/reasoning-heavy workloads; use the plain GLM-5.2-504B otherwise.

Eval (n=2000 held-out real prompts, raw, no max_tokens / no timeout)

metric GLM-5.2-504B-K (this) GLM-5.2-504B (plain floor)
attractor / loop rate 0.078 0.072
natural-EOS rate 0.923 0.928
distinct-4 0.881 0.880
median tokens 1232 1267

Serving (vLLM)

vllm serve 0xSero/GLM-5.2-504B-K --tensor-parallel-size 8 \
  --quantization modelopt_fp4 --kv-cache-dtype fp8 --trust-remote-code --max-model-len 262144

REAP knowledge-augmented cut + full-data Router-KD. Compute sponsored by Lambda.

Honest note (n=2000)

The unpruned teacher loops on only 3.6% of these prompts vs ~7-8% for this pruned cut — REAP pruning roughly doubles the loop rate, and gate-only Router-KD (even on full data) does not close it. Earlier small-n evals suggesting parity were a sampling fluke. A knowledge-recovery LoRA is in progress to add capacity back.

Downloads last month
126
Safetensors
Model size
292B params
Tensor type
BF16
·
F32
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0xSero/GLM-5.2-504B-K

Base model

zai-org/GLM-5.2
Quantized
(80)
this model

Collection including 0xSero/GLM-5.2-504B-K