Model Overview

  • Model Architecture: GLM-5.2
    • Input: Text
    • Output: Text
  • Supported Hardware Microarchitecture: AMD MI350/MI355
  • ROCm: 7.0.0
  • PyTorch: 2.9.0
  • Transformers: 5.8.1
  • Operating System(s): Linux
  • Inference Engine: SGLang/vLLM
  • Model Optimizer: AMD-Quark (V0.11)
    • Weight quantization: MOE-only (shared experts quantized), OCP MXFP4, Static
    • Activation quantization: MOE-only, OCP MXFP4, Dynamic

This model was built with GLM-5.2 model by applying AMD-Quark for MXFP4 quantization.

Model Quantization

The model was quantized from zai-org/GLM-5.2 using AMD-Quark. The weights and activations are quantized to MXFP4.

Quantization scripts:

cd Quark/examples/torch/language_modeling/llm_ptq/
  python quantize_quark.py \
      --model_dir zai-org/GLM-5.2 \
      --output_dir GLM-5.2-MXFP4 \
      --quant_scheme mxfp4 \
      --exclude_layers "*self_attn*" "*mlp.gate" "*lm_head" \
          "*mlp.gate_proj" "*mlp.up_proj" "*mlp.down_proj" \
          "*layers.78.*" \  # Exclude the MTP layer (layer 78)
      --file2file_quantization

Deployment

Use with SGLang/vLLM

This model can be deployed efficiently using the SGLang or vLLM backends.

Evaluation

The model was evaluated on GSM8K benchmarks.

Accuracy

Benchmark GLM-5.2 GLM-5.2-MXFP4(this model) Recovery
GSM8K (flexible-extract) 0.9409 0.9393 99.8%

Reproduction

The GSM8K results were obtained using the lm-evaluation-harness framework, based on the Docker image lmsysorg/sglang:v0.5.13.post1-rocm700-mi35x, with SGLang pre-installed inside the image and lm-eval compiled and installed from source.

lm_eval --model sglang \
    --model_args pretrained=amd/GLM-5.2-MXFP4,tp_size=4 \
    --tasks gsm8k \
    --batch_size auto

The Docker image rocm/vllm-dev:nightly_main_20260616 with vLLM pre-installed can also be used for reproducing using vLLM backend.

License

Modifications Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.

Downloads last month
6
Safetensors
Model size
412B params
Tensor type
U8
F32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for amd/GLM-5.2-MXFP4

Base model

zai-org/GLM-5.2
Quantized
(24)
this model