File size: 3,893 Bytes
dc49055
 
 
ed07c8f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dc49055
 
2ca503f
 
 
dc49055
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca4faba
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.3

model-index:
- name: Mistral-7B-Instruct-v0.3-GPTQ-4bit
  results:
  # AI2 Reasoning Challenge (25-Shot)
  - task: 
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
       - type: acc_norm
         name: normalized accuracy
         value: 63.40
  # HellaSwag (10-shot)
  - task: 
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
       - type: acc_norm
         name: normalized accuracy
         value: 84.04
  # TruthfulQA (0-shot)
  - task: 
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
       - type: mc2
         value: 57.48
  # GSM8k (5-shot)
  - task: 
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
       - type: acc
         name: accuracy
         value: 45.41
  # MMLU (5-Shot)
  - task: 
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
       - type: acc
         name: accuracy
         value: 61.07
  # Winogrande (5-shot)
  - task: 
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
       - type: acc
         name: accuracy
         value: 79.08

---

# Model Card for Mistral-7B-Instruct-v0.3 quantized to 4bit weights

- Weight-only quantization of [Mistral-7B-Instruct-v0.3](mistralai/Mistral-7B-Instruct-v0.3) via GPTQ to 4bits with group_size=128
- GPTQ optimized for 99.75% accuracy recovery relative to the unquantized model

# Open LLM Leaderboard evaluation scores
|                      | Mistral-7B-Instruct-v0.3 | Mistral-7B-Instruct-v0.3-GPTQ-4bit<br>(this model) |
| :------------------: | :----------------------: | :------------------------------------------------: |
| arc-c<br>25-shot     | 63.48                    | 63.40                                              |
| mmlu<br>5-shot       | 61.13                    | 60.89                                              |
| hellaswag<br>10-shot | 84.49                    | 84.04                                              |
| winogrande<br>5-shot | 79.16                    | 79.08                                              |
| gsm8k<br>5-shot      | 43.37                    | 45.41                                              |
| truthfulqa<br>0-shot | 59.65                    | 57.48                                              |
| **Average<br>Accuracy**  | **65.21**                    |              **65.05**                                     |
| **Recovery**             | **100%**                     |              **99.75%**                                     |

# vLLM Inference Performance

This model is ready for optimized inference using the Marlin mixed-precision kernels in vLLM: https://github.com/vllm-project/vllm

Simply start this model as an inference server with:
```bash
python -m vllm.entrypoints.openai.api_server --model neuralmagic/Mistral-7B-Instruct-v0.3-GPTQ-4bit
```

![image/png](https://cdn-uploads.huggingface.co/production/uploads/60466e4b4f40b01b66151416/SC_tYXjoS3yIoOYtfqZ2E.png)