YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

GLM-5.2-Int8Mix-NVFP4-REAP-594B

Benchmarks in: GPQA Diamond 86.87 (β‰ˆ97% of full NVFP4) Β· SciCode 47.77 (β‰ˆ1.3 pts under full NVFP4). IFBench / τ²-Bench Telecom pending.

A REAP-pruned (β‰ˆ22% of experts removed) Int8-mix NVFP4 quantization of GLM-5.2, β‰ˆ594B parameters.

Evaluation

Measured under NVIDIA's evaluation protocol: temperature=1.0, top_p=0.95; GPQA Diamond used max_new_tokens=100000, others used max_new_tokens=64000 (SciCode via the official inspect_ai scorer, with-background). Full-model rows are NVIDIA's published figures for the unpruned GLM-5.2; the REAP rows are measured with reap-bench. Intelligence lost = relative drop vs full NVFP4 (same quant β†’ isolates the prune itself).

Model GPQA Diamond SciCode IFBench τ²-Bench Telecom
GLM-5.2 FP8 β€” full (NVIDIA ref) 89.52 49.85 74.95 97.9
GLM-5.2 NVFP4 β€” full (NVIDIA ref) 89.39 49.04 75.81 98.25
GLM-5.2-Int8Mix-NVFP4-REAP-594B (this model) Β· ~22% prune 86.87 47.77 β€” β€”
↳ intelligence lost vs full NVFP4 βˆ’2.8% βˆ’2.6% β€” β€”
GLM-5.2-NVFP4-REAP-504B-term Β· ~34% prune β€” 44.67 β€” β€”
↳ intelligence lost vs full NVFP4 β€” βˆ’8.9% β€” β€”

GPQA Diamond: 172/198 correct, 0 errors (reasoning_effort=max). SciCode (with-background): 139/291 subproblems = 47.77%, 11/65 problems fully solved (16.92%), 65/65 samples, 0 errors. So far β‰ˆ97% of the full NVFP4 model's measured intelligence is retained for an β‰ˆ22% expert prune β€” and on both axes the 594B clearly beats the more-aggressively-pruned REAP-504B-term (168 experts). IFBench / τ²-Bench Telecom pending.

Datasets: GPQA Diamond (gpqa_diamond.csv, 198 Q) β€” Rein et al., arXiv:2311.12022. SciCode via the official inspect_ai harness. Harness: reap-bench.

Downloads last month
312
Safetensors
Model size
327B params
Tensor type
BF16
Β·
I64
Β·
I32
Β·
F32
Β·
F8_E4M3
Β·
U8
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for madeby561/GLM-5.2-Int8Mix-NVFP4-REAP-594B