Edit model card

Meta-Llama-3-70B-Instruct-FP8

Model Overview

Meta-Llama-3-70B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.

Usage and Creation

Produced using AutoFP8 with calibration samples from ultrachat.

Evaluation

Open LLM Leaderboard evaluation scores

Meta-Llama-3-70B-Instruct Meta-Llama-3-70B-Instruct-FP8
(this model)
arc-c
25-shot
72.69 72.61
hellaswag
10-shot
85.50 85.41
mmlu
5-shot
80.18 80.06
truthfulqa
0-shot
62.90 62.73
winogrande
5-shot
83.34 83.03
gsm8k
5-shot
92.49 91.12
Average
Accuracy
79.51 79.16
Recovery 100% 99.55%
Downloads last month
245
Safetensors
Model size
70.6B params
Tensor type
BF16
·
F8_E4M3
·

Collection including neuralmagic/Meta-Llama-3-70B-Instruct-FP8