Edit model card

Model Card for Mistral-7B-Instruct-v0.3 quantized to 4bit weights

  • Weight-only quantization of Mistral-7B-Instruct-v0.3 via GPTQ to 4bits with group_size=128
  • GPTQ optimized for 99.75% accuracy recovery relative to the unquantized model

Open LLM Leaderboard evaluation scores

Mistral-7B-Instruct-v0.3 Mistral-7B-Instruct-v0.3-GPTQ-4bit
(this model)
arc-c
25-shot
63.48 63.40
mmlu
5-shot
61.13 60.89
hellaswag
10-shot
84.49 84.04
winogrande
5-shot
79.16 79.08
gsm8k
5-shot
43.37 45.41
truthfulqa
0-shot
59.65 57.48
Average
Accuracy
65.21 65.05
Recovery 100% 99.75%

vLLM Inference Performance

This model is ready for optimized inference using the Marlin mixed-precision kernels in vLLM: https://github.com/vllm-project/vllm

Simply start this model as an inference server with:

python -m vllm.entrypoints.openai.api_server --model neuralmagic/Mistral-7B-Instruct-v0.3-GPTQ-4bit

image/png

Downloads last month
2,401
Safetensors
Model size
1.21B params
Tensor type
I32
·
FP16
·

Quantized from

Collection including neuralmagic/Mistral-7B-Instruct-v0.3-GPTQ-4bit

Evaluation results