Model Overview

Description

FP8 Quantized QwQ-32B.

Evaluation

The test results in the following table are based on the MMLU benchmark.

In order to speed up the test, we prevent the model from generating too long thought chains, so the score may be different from that with longer thought chain.

In our experiment, the accuracy of the FP8 quantized version is almost the same as the BF16 version, and it can be used for faster inference.

Data Format MMLU Score
BF16 Official 61.2
FP8 Quantized 61.2
Q8_0 (INT8) 59.1
AWQ (INT4) 53.4

Contact

solution@qingcheng.ai

Downloads last month
35
Safetensors
Model size
32.8B params
Tensor type
F32
FP16
F8_E4M3
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for qingcheng-ai/QWQ-32B-FP8

Base model

Qwen/Qwen2.5-32B
Finetuned
Qwen/QwQ-32B
Quantized
(129)
this model