markoarnauto's picture
Upload README.md with huggingface_hub
46718cc verified
|
raw
history blame
13 kB
metadata
datasets: wikitext
license: other
license_link: https://llama.meta.com/llama3/license/

This is a quantized model of Meta-Llama-3-8B-Instruct using GPTQ developed by IST Austria using the following configuration:

  • 8bit
  • Act order: True
  • Group size: 128

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b",
        "prompt": "San Francisco is a"
    } '

Evaluations

English Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 66.97 67.0 63.52
ARC 62.5 62.5 54.6
Hellaswag 70.3 70.3 69.5
MMLU 68.11 68.21 66.46
French Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 57.73 57.7 53.33
Hellaswag_fr 61.7 62.2 59.3
ARC_fr 53.3 53.1 46.4
MMLU_fr 58.2 57.8 54.3
German Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 53.47 53.67 49.0
ARC_de 49.1 49.0 41.6
Hellaswag_de 55.0 55.2 53.3
MMLU_de 56.3 56.8 52.1
Italian Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 56.73 56.67 51.3
Hellaswag_it 61.3 61.3 58.4
MMLU_it 57.3 57.0 53.0
ARC_it 51.6 51.7 42.5
Safety Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 61.42 61.42 61.53
RealToxicityPrompts 97.2 97.2 97.2
TruthfulQA 51.65 51.58 51.98
CrowS 35.42 35.48 35.42
Spanish Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 59 58.63 54.6
ARC_es 54.1 53.8 46.9
Hellaswag_es 63.8 63.3 60.3
MMLU_es 59.1 58.8 56.6

We did not check for data contamination. Evaluation was done using Eval. Harness using limit=1000.

Performance

requests/s tokens/s
NVIDIA L4x1 2.75 1312.26
NVIDIA L4x2 4.36 2080.17
NVIDIA L4x4 5.33 2539.76
Performance measured on cortecs inference.