FP8 LLMs for vLLM
Collection
Accurate FP8 quantized models by Neural Magic, ready for use with vLLM!
•
10 items
•
Updated
•
15
Mixtral-8x22B-Instruct-v0.1 quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.
Produced using AutoFP8 with calibration samples from ultrachat.
Mixtral-8x22B-Instruct-v0.1 | Mixtral-8x22B-Instruct-v0.1-FP8 (this model) |
|
---|---|---|
arc-c 25-shot (acc_norm) |
72.70 | 72.53 |
hellaswag 10-shot (acc_norm) |
89.08 | 88.10 |
mmlu 5-shot |
77.77 | 76.08 |
truthfulqa 0-shot (acc) |
68.14 | 66.32 |
winogrande 5-shot (acc) |
85.16 | 84.37 |
gsm8k 5-shot (strict-match) |
82.03 | 83.40 |
Average Accuracy |
79.15 | 78.47 |
Recovery | 100% | 99.14% |