metadata

tags:
  - fp8
  - vllm

Mixtral-8x22B-Instruct-v0.1-FP8

Model Overview

Mixtral-8x22B-Instruct-v0.1 quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.

	Mixtral-8x22B-Instruct-v0.1	Mixtral-8x22B-Instruct-v0.1-FP8 (this model)
arc-c 25-shot	72.70	69.19
hellaswag 10-shot	89.08	82.49
mmlu 5-shot	77.77	70.61
truthfulqa 0-shot	68.14	65.73
winogrande 5-shot	85.16	82.63
gsm8k 5-shot	82.03	76.57
Average Accuracy	79.15	74.53
Recovery	100%	94.17%