Edit model card

Mixtral-8x22B-Instruct-v0.1-FP8

Model Overview

Mixtral-8x22B-Instruct-v0.1 quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.

Usage and Creation

Produced using AutoFP8 with calibration samples from ultrachat.

Evaluation

Open LLM Leaderboard evaluation scores

Mixtral-8x22B-Instruct-v0.1 Mixtral-8x22B-Instruct-v0.1-FP8
(this model)
arc-c
25-shot (acc_norm)
72.70 72.53
hellaswag
10-shot (acc_norm)
89.08 88.10
mmlu
5-shot
77.77 76.08
truthfulqa
0-shot (acc)
68.14 66.32
winogrande
5-shot (acc)
85.16 84.37
gsm8k
5-shot (strict-match)
82.03 83.40
Average
Accuracy
79.15 78.47
Recovery 100% 99.14%
Downloads last month
16
Safetensors
Model size
141B params
Tensor type
BF16
·
F8_E4M3
·

Collection including neuralmagic/Mixtral-8x22B-Instruct-v0.1-FP8