File size: 5,647 Bytes
544606b
 
 
 
 
2314c91
 
 
 
 
 
544606b
2314c91
 
 
 
 
4200ed6
2314c91
 
 
 
 
 
4200ed6
2314c91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4200ed6
2314c91
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
datasets: wikitext
license: apache-2.0
license_link: https://llama.meta.com/llama3/license/
---
This is a quantized model of [Llama-3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
 using the following configuration:
 - 4bit (8bit will follow)
- Act order: True
 - Group size: 128
 - Seq. length: 4096

## Usage
Install **vLLM** and 
    run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):
    
```
python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-70B-Instruct-GPTQ
```
Access the model:
```
curl http://localhost:8000/v1/completions 
    -H "Content-Type: application/json"
    -d '{
        "model": "cortecs/Meta-Llama-3-70B-Instruct-GPTQ",
        "prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Tell me a joke<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
    }'
```

## Evaluations
| __English__   | __Llama-3 70B Instruct__   | __Llama 3 70B GPTQ__   | __Llama-3 8B Instruct__   |
|:--------------|:---------------------------|:-----------------------|:--------------------------|
| Avg.          | 76.19                      | 75.14                  | 66.97                     |
| ARC           | 71.6                       | 70.7                   | 62.5                      |
| Hellaswag     | 77.3                       | 76.4                   | 70.3                      |
| MMLU          | 79.66                      | 78.33                  | 68.11                     |
|               |                            |                        |                           |
| __French__   | __Llama-3 70B Instruct__   | __Llama 3 70B GPTQ__   | __Llama-3 8B Instruct__   |
| Avg.         | 70.97                      | 70.27                  | 57.73                     |
| ARC_fr       | 65.0                       | 64.7                   | 53.3                      |
| Hellaswag_fr | 72.4                       | 71.4                   | 61.7                      |
| MMLU_fr      | 75.5                       | 74.7                   | 58.2                      |
|              |                            |                        |                           |
| __German__   | __Llama-3 70B Instruct__   | __Llama 3 70B GPTQ__   | __Llama-3 8B Instruct__   |
| Avg.         | 68.43                      | 66.93                  | 53.47                     |
| ARC_de       | 64.2                       | 62.6                   | 49.1                      |
| Hellaswag_de | 67.8                       | 66.7                   | 55.0                      |
| MMLU_de      | 73.3                       | 71.5                   | 56.3                      |
|              |                            |                        |                           |
| __Italian__   | __Llama-3 70B Instruct__   | __Llama 3 70B GPTQ__   | __Llama-3 8B Instruct__   |
| Avg.          | 70.17                      | 68.63                  | 56.73                     |
| ARC_it        | 64.0                       | 62.1                   | 51.6                      |
| Hellaswag_it  | 72.6                       | 71.0                   | 61.3                      |
| MMLU_it       | 73.9                       | 72.8                   | 57.3                      |
|               |                            |                        |                           |
| __Safety__          | __Llama-3 70B Instruct__   | __Llama 3 70B GPTQ__   | __Llama-3 8B Instruct__   |
| Avg.                | 64.28                      | 63.64                  | 61.42                     |
| RealToxicityPrompts | 97.9                       | 98.1                   | 97.2                      |
| TruthfulQA          | 61.91                      | 59.91                  | 51.65                     |
| CrowS               | 33.04                      | 32.92                  | 35.42                     |
|                     |                            |                        |                           |
| __Spanish__   |   __Llama-3 70B Instruct__ |   __Llama 3 70B GPTQ__ |   __Llama-3 8B Instruct__ |
| Avg.          |                       72.5 |                   71.3 |                      59   |
| ARC_es        |                       66.7 |                   65.7 |                      54.1 |
| Hellaswag_es  |                       75.8 |                   74   |                      63.8 |
| MMLU_es       |                       75   |                   74.2 |                      59.1 |

Take with caution. We did not check for data contamination.
     Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000` for big datasets. 
    
## Performance
| __Llama-3 70B Instruct__   | __requests/s__   | __tokens/s__   |
|:---------------------------|:-----------------|:---------------|
| NVIDIA L40Sx4              | 2.38             | 1135.41        |
|                            |                  |                |
| __Llama 3 70B GPTQ__   | __requests/s__   | __tokens/s__   |
| NVIDIA L40Sx2          | 2.0              | 951.28         |
|                        |                  |                |
| __Llama-3 8B Instruct__   |   __requests/s__ |   __tokens/s__ |
| NVIDIA L40Sx1             |            11.64 |        5548.63 |
| NVIDIA L4x1               |             2.76 |        1315.25 |
| NVIDIA L4x2               |             4.79 |        2283.53 |
Performance was measured on [cortecs.ai](https://cortecs.ai).