File size: 8,171 Bytes
c961303
033dfe3
3d623bd
556045b
c961303
9f16fb2
556045b
9f16fb2
556045b
 
 
 
 
 
 
 
 
 
 
 
 
 
9f16fb2
556045b
 
 
 
9f16fb2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
556045b
9f16fb2
 
556045b
 
 
 
 
9f16fb2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
datasets: LeoLM/wikitext-en-de
license: other
license_link: https://llama.meta.com/llama3/license/
---
This is a quantized model of [Llama-3-SauerkrautLM-70b-Instruct](https://huggingface.co/VAGOSolutions/Llama-3-SauerkrautLM-70b-Instruct) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
 using the following configuration:
 - 4bit 
- Act order: True
 - Group size: 128

## Usage
Install **vLLM** and 
    run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):
    
```
python -m vllm.entrypoints.openai.api_server --model cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ
```
Access the model:
```
curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ",
        "prompt": "San Francisco is a"
    } '
```

## Evaluations
| __English__   | __[Llama-3-SauerkrautLM-70b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-70b-Instruct)__   | __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b)__   | __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ)__   |
|:--------------|:------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------|
| Avg.          | 78.17                                                                                                             | 78.1                                                                                                                        | 76.72                                                                                                                 |
| ARC           | 74.5                                                                                                              | 74.4                                                                                                                        | 73.0                                                                                                                  |
| Hellaswag     | 79.2                                                                                                              | 79.2                                                                                                                        | 78.0                                                                                                                  |
| MMLU          | 80.8                                                                                                              | 80.7                                                                                                                        | 79.15                                                                                                                 |
|               |                                                                                                                   |                                                                                                                             |                                                                                                                       |
| __German__   | __[Llama-3-SauerkrautLM-70b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-70b-Instruct)__   | __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b)__   | __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ)__   |
| Avg.         | 70.83                                                                                                             | 70.47                                                                                                                       | 69.13                                                                                                                 |
| ARC_de       | 66.7                                                                                                              | 66.2                                                                                                                        | 65.9                                                                                                                  |
| Hellaswag_de | 70.8                                                                                                              | 71.0                                                                                                                        | 68.8                                                                                                                  |
| MMLU_de      | 75.0                                                                                                              | 74.2                                                                                                                        | 72.7                                                                                                                  |
|              |                                                                                                                   |                                                                                                                             |                                                                                                                       |
| __Safety__          |   __[Llama-3-SauerkrautLM-70b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-70b-Instruct)__ |   __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b)__ |   __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ)__ |
| Avg.                |                                                                                                             65.86 |                                                                                                                       65.94 |                                                                                                                 65.94 |
| RealToxicityPrompts |                                                                                                             97.6  |                                                                                                                       97.8  |                                                                                                                 98.4  |
| TruthfulQA          |                                                                                                             67.07 |                                                                                                                       66.92 |                                                                                                                 65.56 |
| CrowS               |                                                                                                             32.92 |                                                                                                                       33.09 |                                                                                                                 33.87 |

We did not check for data contamination.
     Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000`. 
    
## Performance
|               |   requests/s |   tokens/s |
|:--------------|-------------:|-----------:|
| NVIDIA L40Sx2 |         2.19 |    1044.76 |
Performance measured on [cortecs inference](https://cortecs.ai).