neuralmagic
/

Mixtral-8x22B-Instruct-v0.1-FP8

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

Edit model card

Mixtral-8x22B-Instruct-v0.1-FP8

Model Overview

Mixtral-8x22B-Instruct-v0.1 quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.

Usage and Creation

Produced using AutoFP8 with calibration samples from ultrachat.

Evaluation

Open LLM Leaderboard evaluation scores

	Mixtral-8x22B-Instruct-v0.1	Mixtral-8x22B-Instruct-v0.1-FP8 (this model)
arc-c 25-shot (acc_norm)	72.70	72.53
hellaswag 10-shot (acc_norm)	89.08	88.10
mmlu 5-shot	77.77	76.08
truthfulqa 0-shot (acc)	68.14	66.32
winogrande 5-shot (acc)	85.16	84.37
gsm8k 5-shot (strict-match)	82.03	83.40
Average Accuracy	79.15	78.47
Recovery	100%	99.14%

Downloads last month: 16

Safetensors

Model size

141B params

Tensor type

BF16

·

F8_E4M3

·

Collection including neuralmagic/Mixtral-8x22B-Instruct-v0.1-FP8

FP8 LLMs for vLLM

Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 10 items • Updated 1 day ago • 15