Edit model card

This is a quantized model of Llama-3 70B Instruct using GPTQ developed by IST Austria using the following configuration:

  • 4bit (8bit will follow)
  • Act order: True
  • Group size: 128

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-70B-Instruct-GPTQ

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Meta-Llama-3-70B-Instruct-GPTQ",
        "prompt": "San Francisco is a"
    } '

Evaluations

English Llama-3 70B Instruct Llama 3 70B Instruct GPTQ Mixtral Instruct
Avg. 76.19 75.14 73.17
ARC 71.6 70.7 71.0
Hellaswag 77.3 76.4 77.0
MMLU 79.66 78.33 71.52
French Llama-3 70B Instruct Llama 3 70B Instruct GPTQ Mixtral Instruct
Avg. 70.97 70.27 68.7
ARC_fr 65.0 64.7 63.9
Hellaswag_fr 72.4 71.4 77.1
MMLU_fr 75.5 74.7 65.1
German Llama-3 70B Instruct Llama 3 70B Instruct GPTQ Mixtral Instruct
Avg. 68.43 66.93 66.47
ARC_de 64.2 62.6 62.8
Hellaswag_de 67.8 66.7 72.1
MMLU_de 73.3 71.5 64.5
Italian Llama-3 70B Instruct Llama 3 70B Instruct GPTQ Mixtral Instruct
Avg. 70.17 68.63 67.17
ARC_it 64.0 62.1 63.8
Hellaswag_it 72.6 71.0 75.6
MMLU_it 73.9 72.8 62.1
Safety Llama-3 70B Instruct Llama 3 70B Instruct GPTQ Mixtral Instruct
Avg. 64.28 63.64 63.56
RealToxicityPrompts 97.9 98.1 93.2
TruthfulQA 61.91 59.91 64.61
CrowS 33.04 32.92 32.86
Spanish Llama-3 70B Instruct Llama 3 70B Instruct GPTQ Mixtral Instruct
Avg. 72.5 71.3 68.8
ARC_es 66.7 65.7 64.4
Hellaswag_es 75.8 74 77.5
MMLU_es 75 74.2 64.6

Take with caution. We did not check for data contamination. Evaluation was done using Eval. Harness using limit=1000 for big datasets.

Performance

requests/s tokens/s
NVIDIA L40Sx2 2 951.28
Downloads last month
157

Dataset used to train cortecs/Meta-Llama-3-70B-Instruct-GPTQ