metadata

tags:
  - fp8
  - vllm

Mixtral-8x7B-Instruct-v0.1-FP8

Model Overview

Mixtral-8x7B-Instruct-v0.1 quantized to FP8 weights and activations, ready for inference with vLLM >= 0.5.0.

Produced using AutoFP8 with calibration samples from ultrachat with block_sparse_moe.gate layers kept at original precision.

	Mixtral-8x7B-Instruct-v0.1	Mixtral-8x7B-Instruct-v0.1-FP8 (this model)
arc-c 25-shot	71.50	71.08
hellaswag 10-shot	87.53	87.38
mmlu 5-shot	70.33	70.00
truthfulqa 0-shot	64.79	64.20
winogrande 5-shot	82.40	82.40
gsm8k 5-shot	64.36	64.06
Average Accuracy	73.48	73.19
Recovery	100%	99.61%