File size: 12,990 Bytes
f766902
e259987
 
 
f766902
e259987
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
datasets: wikitext
license: other
license_link: https://llama.meta.com/llama3/license/
---
This is a quantized model of [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
 using the following configuration:
 - 4bit 
- Act order: True
 - Group size: 128

## Usage
Install **vLLM** and 
    run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):
    
```
python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-8B-Instruct-GPTQ
```
Access the model:
```
curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Meta-Llama-3-8B-Instruct-GPTQ",
        "prompt": "San Francisco is a"
    } '
```

## Evaluations
| __English__   | __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__   | __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__   | __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__   |
|:--------------|:---------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------|
| Avg.          | 66.97                                                                                        | 67.0                                                                                                      | 63.52                                                                                               |
| ARC           | 62.5                                                                                         | 62.5                                                                                                      | 54.6                                                                                                |
| Hellaswag     | 70.3                                                                                         | 70.3                                                                                                      | 69.5                                                                                                |
| MMLU          | 68.11                                                                                        | 68.21                                                                                                     | 66.46                                                                                               |
|               |                                                                                              |                                                                                                           |                                                                                                     |
| __French__   | __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__   | __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__   | __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__   |
| Avg.         | 57.73                                                                                        | 57.7                                                                                                      | 53.33                                                                                               |
| Hellaswag_fr | 61.7                                                                                         | 62.2                                                                                                      | 59.3                                                                                                |
| ARC_fr       | 53.3                                                                                         | 53.1                                                                                                      | 46.4                                                                                                |
| MMLU_fr      | 58.2                                                                                         | 57.8                                                                                                      | 54.3                                                                                                |
|              |                                                                                              |                                                                                                           |                                                                                                     |
| __German__   | __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__   | __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__   | __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__   |
| Avg.         | 53.47                                                                                        | 53.67                                                                                                     | 49.0                                                                                                |
| ARC_de       | 49.1                                                                                         | 49.0                                                                                                      | 41.6                                                                                                |
| Hellaswag_de | 55.0                                                                                         | 55.2                                                                                                      | 53.3                                                                                                |
| MMLU_de      | 56.3                                                                                         | 56.8                                                                                                      | 52.1                                                                                                |
|              |                                                                                              |                                                                                                           |                                                                                                     |
| __Italian__   | __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__   | __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__   | __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__   |
| Avg.          | 56.73                                                                                        | 56.67                                                                                                     | 51.3                                                                                                |
| Hellaswag_it  | 61.3                                                                                         | 61.3                                                                                                      | 58.4                                                                                                |
| MMLU_it       | 57.3                                                                                         | 57.0                                                                                                      | 53.0                                                                                                |
| ARC_it        | 51.6                                                                                         | 51.7                                                                                                      | 42.5                                                                                                |
|               |                                                                                              |                                                                                                           |                                                                                                     |
| __Safety__          | __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__   | __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__   | __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__   |
| Avg.                | 61.42                                                                                        | 61.42                                                                                                     | 61.53                                                                                               |
| RealToxicityPrompts | 97.2                                                                                         | 97.2                                                                                                      | 97.2                                                                                                |
| TruthfulQA          | 51.65                                                                                        | 51.58                                                                                                     | 51.98                                                                                               |
| CrowS               | 35.42                                                                                        | 35.48                                                                                                     | 35.42                                                                                               |
|                     |                                                                                              |                                                                                                           |                                                                                                     |
| __Spanish__   |   __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__ |   __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__ |   __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__ |
| Avg.          |                                                                                         59   |                                                                                                     58.63 |                                                                                                54.6 |
| ARC_es        |                                                                                         54.1 |                                                                                                     53.8  |                                                                                                46.9 |
| Hellaswag_es  |                                                                                         63.8 |                                                                                                     63.3  |                                                                                                60.3 |
| MMLU_es       |                                                                                         59.1 |                                                                                                     58.8  |                                                                                                56.6 |

We did not check for data contamination.
     Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000`. 
    
## Performance
|             |   requests/s |   tokens/s |
|:------------|-------------:|-----------:|
| NVIDIA L4x1 |         3.96 |    1887.55 |
| NVIDIA L4x2 |         4.87 |    2323.34 |
| NVIDIA L4x4 |         5.61 |    2674.18 |
Performance measured on [cortecs inference](https://cortecs.ai).