GLM-5.2-MLX-nvfp4

An MLX conversion of zai-org/GLM-5.2 quantized to NVFP4 (4-bit FP4, group size 16) for Apple Silicon with mlx-lm.

This is the MLX analog of NVIDIA's nvidia/GLM-5.2-NVFP4. NVIDIA's checkpoint stores weights in ModelOpt-packed NVFP4 that mlx-lm cannot read directly, so this build was produced by quantizing the bf16 base with MLX's own NVFP4 mode (--q-mode nvfp4 --q-group-size 16).

  • Base model: zai-org/GLM-5.2 (GlmMoeDsaForCausalLM, 753B total / ~40B active MoE, text-only)
  • Format: MLX, NVFP4 (4-bit FP4, group size 16)
  • Approx. size on disk: 390G
  • Converted with: mlx-lm 0.31.2

Usage

pip install -U mlx-lm
mlx_lm.generate --model pipenetwork/GLM-5.2-MLX-nvfp4 --prompt "Explain mixture-of-experts in one sentence." --max-tokens 128

License

MIT, inherited from the base model.

Downloads last month
18
Safetensors
Model size
743B params
Tensor type
U8
·
U32
·
BF16
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pipenetwork/GLM-5.2-MLX-nvfp4

Base model

zai-org/GLM-5.2
Quantized
(65)
this model

Collection including pipenetwork/GLM-5.2-MLX-nvfp4