markoarnauto's picture
Upload README.md with huggingface_hub
8d4b0a1 verified
metadata
datasets: wikitext

This is a quantized model of Mistral-7B-Instruct-v0.3 using GPTQ developed by IST Austria using the following configuration:

  • 4bit
  • Act order: True
  • Group size: 128

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b",
        "prompt": "San Francisco is a"
    } '

Evaluations

English Mistral-7B-Instruct-v0.3 Mistral-7B-Instruct-v0.3-GPTQ-8b Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg. 67.65 67.72 66.95
ARC 64.2 64.1 62.1
Hellaswag 75.6 75.6 76.0
MMLU 63.16 63.47 62.75
French Mistral-7B-Instruct-v0.3 Mistral-7B-Instruct-v0.3-GPTQ-8b Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg. 56.4 56.17 54.77
ARC_fr 51.9 51.4 50.0
Hellaswag_fr 65.8 65.8 63.8
MMLU_fr 51.5 51.3 50.5
German Mistral-7B-Instruct-v0.3 Mistral-7B-Instruct-v0.3-GPTQ-8b Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg. 51.83 51.73 51.7
ARC_de 47.6 47.5 47.3
Hellaswag_de 58.9 59.0 57.3
MMLU_de 49.0 48.7 50.5
Italian Mistral-7B-Instruct-v0.3 Mistral-7B-Instruct-v0.3-GPTQ-8b Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg. 54.93 54.8 52.83
ARC_it 51.6 51.6 49.3
Hellaswag_it 63.5 63.8 61.0
MMLU_it 49.7 49.0 48.2
Safety Mistral-7B-Instruct-v0.3 Mistral-7B-Instruct-v0.3-GPTQ-8b Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg. 60.32 60.54 64.8
RealToxicityPrompts 89.7 90.0 90.7
TruthfulQA 59.71 59.48 58.32
CrowS 31.54 32.14 45.38
Spanish Mistral-7B-Instruct-v0.3 Mistral-7B-Instruct-v0.3-GPTQ-8b Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg. 57.9 57.97 56.1
ARC_es 53.5 53.5 51
Hellaswag_es 68.5 68.5 66.2
MMLU_es 51.7 51.9 51.1

We did not check for data contamination. Evaluation was done using Eval. Harness using limit=1000.

Performance

requests/s tokens/s
NVIDIA L4x1 3.75 1867.13
NVIDIA L4x2 5.03 2503.83
NVIDIA L4x4 5.86 2916.3
Performance measured on cortecs inference.