mgoin's picture
Update README.md
54e6f9f verified
metadata
tags:
  - fp8
  - vllm

Mixtral-8x7B-Instruct-v0.1-FP8

Model Overview

Mixtral-8x7B-Instruct-v0.1 quantized to FP8 weights and activations, ready for inference with vLLM >= 0.5.0.

Usage and Creation

Produced using AutoFP8 with calibration samples from ultrachat with block_sparse_moe.gate layers kept at original precision.

Evaluation

Open LLM Leaderboard evaluation scores

Mixtral-8x7B-Instruct-v0.1 Mixtral-8x7B-Instruct-v0.1-FP8
(this model)
arc-c
25-shot
71.50 71.08
hellaswag
10-shot
87.53 87.38
mmlu
5-shot
70.33 70.00
truthfulqa
0-shot
64.79 64.20
winogrande
5-shot
82.40 82.40
gsm8k
5-shot
64.36 64.06
Average
Accuracy
73.48 73.19
Recovery 100% 99.61%