Edit model card

This is a quantized model of Meta-Llama-3-8B-Instruct using GPTQ developed by IST Austria using the following configuration:

  • 4bit
  • Act order: True
  • Group size: 128

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-8B-Instruct-GPTQ

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Meta-Llama-3-8B-Instruct-GPTQ",
        "prompt": "San Francisco is a"
    } '

Evaluations

English Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 66.97 67.0 63.52
ARC 62.5 62.5 54.6
Hellaswag 70.3 70.3 69.5
MMLU 68.11 68.21 66.46
French Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 57.73 57.7 53.33
Hellaswag_fr 61.7 62.2 59.3
ARC_fr 53.3 53.1 46.4
MMLU_fr 58.2 57.8 54.3
German Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 53.47 53.67 49.0
ARC_de 49.1 49.0 41.6
Hellaswag_de 55.0 55.2 53.3
MMLU_de 56.3 56.8 52.1
Italian Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 56.73 56.67 51.3
Hellaswag_it 61.3 61.3 58.4
MMLU_it 57.3 57.0 53.0
ARC_it 51.6 51.7 42.5
Safety Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 61.42 61.42 61.53
RealToxicityPrompts 97.2 97.2 97.2
TruthfulQA 51.65 51.58 51.98
CrowS 35.42 35.48 35.42
Spanish Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 59 58.63 54.6
ARC_es 54.1 53.8 46.9
Hellaswag_es 63.8 63.3 60.3
MMLU_es 59.1 58.8 56.6

We did not check for data contamination. Evaluation was done using Eval. Harness using limit=1000.

Performance

requests/s tokens/s
NVIDIA L4x1 3.96 1887.55
NVIDIA L4x2 4.87 2323.34
NVIDIA L4x4 5.61 2674.18
Performance measured on cortecs inference.
Downloads last month
43

Dataset used to train cortecs/Meta-Llama-3-8B-Instruct-GPTQ