File size: 5,090 Bytes
544606b
 
633cf62
544606b
 
2314c91
 
 
 
 
544606b
2314c91
 
 
 
 
4200ed6
2314c91
 
 
a8f7ea6
 
 
 
2314c91
 
 
e5ac497
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2314c91
 
 
 
 
7001a45
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
license: apache-2.0
datasets: wikitext
license_link: https://llama.meta.com/llama3/license/
---
This is a quantized model of [Llama-3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
 using the following configuration:
 - 4bit (8bit will follow)
- Act order: True
 - Group size: 128

## Usage
Install **vLLM** and 
    run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):
    
```
python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-70B-Instruct-GPTQ
```
Access the model:
```
curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Meta-Llama-3-70B-Instruct-GPTQ",
        "prompt": "San Francisco is a"
    } '
```

## Evaluations
| __English__   | __Llama-3 70B Instruct__   | __Llama 3 70B Instruct GPTQ__   | __Mixtral Instruct__   |
|:--------------|:---------------------------|:--------------------------------|:-----------------------|
| Avg.          | 76.19                      | 75.14                           | 73.17                  |
| ARC           | 71.6                       | 70.7                            | 71.0                   |
| Hellaswag     | 77.3                       | 76.4                            | 77.0                   |
| MMLU          | 79.66                      | 78.33                           | 71.52                  |
|               |                            |                                 |                        |
| __French__   | __Llama-3 70B Instruct__   | __Llama 3 70B Instruct GPTQ__   | __Mixtral Instruct__   |
| Avg.         | 70.97                      | 70.27                           | 68.7                   |
| ARC_fr       | 65.0                       | 64.7                            | 63.9                   |
| Hellaswag_fr | 72.4                       | 71.4                            | 77.1                   |
| MMLU_fr      | 75.5                       | 74.7                            | 65.1                   |
|              |                            |                                 |                        |
| __German__   | __Llama-3 70B Instruct__   | __Llama 3 70B Instruct GPTQ__   | __Mixtral Instruct__   |
| Avg.         | 68.43                      | 66.93                           | 66.47                  |
| ARC_de       | 64.2                       | 62.6                            | 62.8                   |
| Hellaswag_de | 67.8                       | 66.7                            | 72.1                   |
| MMLU_de      | 73.3                       | 71.5                            | 64.5                   |
|              |                            |                                 |                        |
| __Italian__   | __Llama-3 70B Instruct__   | __Llama 3 70B Instruct GPTQ__   | __Mixtral Instruct__   |
| Avg.          | 70.17                      | 68.63                           | 67.17                  |
| ARC_it        | 64.0                       | 62.1                            | 63.8                   |
| Hellaswag_it  | 72.6                       | 71.0                            | 75.6                   |
| MMLU_it       | 73.9                       | 72.8                            | 62.1                   |
|               |                            |                                 |                        |
| __Safety__          | __Llama-3 70B Instruct__   | __Llama 3 70B Instruct GPTQ__   | __Mixtral Instruct__   |
| Avg.                | 64.28                      | 63.64                           | 63.56                  |
| RealToxicityPrompts | 97.9                       | 98.1                            | 93.2                   |
| TruthfulQA          | 61.91                      | 59.91                           | 64.61                  |
| CrowS               | 33.04                      | 32.92                           | 32.86                  |
|                     |                            |                                 |                        |
| __Spanish__   |   __Llama-3 70B Instruct__ |   __Llama 3 70B Instruct GPTQ__ |   __Mixtral Instruct__ |
| Avg.          |                       72.5 |                            71.3 |                   68.8 |
| ARC_es        |                       66.7 |                            65.7 |                   64.4 |
| Hellaswag_es  |                       75.8 |                            74   |                   77.5 |
| MMLU_es       |                       75   |                            74.2 |                   64.6 |

Take with caution. We did not check for data contamination.
     Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000` for big datasets. 
    
## Performance
|               |   requests/s |   tokens/s |
|:--------------|-------------:|-----------:|
| NVIDIA L40Sx2 |            2 |     951.28 |