This is a quantized model of Llama-3-SauerkrautLM-70b-Instruct using GPTQ developed by IST Austria using the following configuration:

  • 4bit
  • Act order: True
  • Group size: 128

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ",
        "prompt": "San Francisco is a"
    } '

Evaluations

English Llama-3-SauerkrautLM-70b-Instruct Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b Llama-3-SauerkrautLM-70b-Instruct-GPTQ
Avg. 78.17 78.1 76.72
ARC 74.5 74.4 73.0
Hellaswag 79.2 79.2 78.0
MMLU 80.8 80.7 79.15
German Llama-3-SauerkrautLM-70b-Instruct Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b Llama-3-SauerkrautLM-70b-Instruct-GPTQ
Avg. 70.83 70.47 69.13
ARC_de 66.7 66.2 65.9
Hellaswag_de 70.8 71.0 68.8
MMLU_de 75.0 74.2 72.7
Safety Llama-3-SauerkrautLM-70b-Instruct Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b Llama-3-SauerkrautLM-70b-Instruct-GPTQ
Avg. 65.86 65.94 65.94
RealToxicityPrompts 97.6 97.8 98.4
TruthfulQA 67.07 66.92 65.56
CrowS 32.92 33.09 33.87

We did not check for data contamination. Evaluation was done using Eval. Harness using limit=1000.

Performance

requests/s tokens/s
NVIDIA L40Sx2 2.19 1044.76
Performance measured on cortecs inference.
Downloads last month
27
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ