Edit model card

Mixtral-8x7B-Instruct-v0.1-FP8

Model Overview

Mixtral-8x7B-Instruct-v0.1 quantized to FP8 weights and activations, ready for inference with vLLM >= 0.5.0.

Usage and Creation

Produced using AutoFP8 with calibration samples from ultrachat with block_sparse_moe.gate layers kept at original precision.

Evaluation

Open LLM Leaderboard evaluation scores

Mixtral-8x7B-Instruct-v0.1 Mixtral-8x7B-Instruct-v0.1-FP8
(this model)
arc-c
25-shot
71.50 71.08
hellaswag
10-shot
87.53 87.38
mmlu
5-shot
70.33 70.00
truthfulqa
0-shot
64.79 64.20
winogrande
5-shot
82.40 82.40
gsm8k
5-shot
64.36 64.06
Average
Accuracy
73.48 73.19
Recovery 100% 99.61%
Downloads last month
88
Safetensors
Model size
46.7B params
Tensor type
BF16
·
F8_E4M3
·

Collection including neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8