Lin-K76 commited on
Commit
904d9b9
1 Parent(s): dcbb731

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -27
README.md CHANGED
@@ -14,15 +14,15 @@ license: apache-2.0
14
  - **Model Optimizations:**
15
  - **Weight quantization:** FP8
16
  - **Activation quantization:** FP8
17
- - **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Meta-Llama-3-7B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-7B-Instruct), this models is intended for assistant-like chat.
18
  - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
19
- - **Release Date:** 6/8/2024
20
- - **Version:** 1.0
21
  - **License(s):** [apache-2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md)
22
  - **Model Developers:** Neural Magic
23
 
24
  Quantized version of [Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1).
25
- It achieves an average score of 78.47 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 79.15.
26
 
27
  ### Model Optimizations
28
 
@@ -88,7 +88,7 @@ examples = tokenizer(examples, padding=True, truncation=True, return_tensors="pt
88
 
89
  quantize_config = BaseQuantizeConfig(
90
  quant_method="fp8",
91
- activation_scheme="static"
92
  ignore_patterns=["re:.*lm_head", "re:.*block_sparse_moe.gate"],
93
  )
94
 
@@ -105,7 +105,7 @@ The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-
105
  ```
106
  lm_eval \
107
  --model vllm \
108
- --model_args pretrained="neuralmagic/Mixtral-8x22B-Instruct-v0.1-FP8",dtype=auto,gpu_memory_utilization=0.4,add_bos_token=True,max_model_len=4096 \
109
  --tasks openllm \
110
  --batch_size auto
111
  ```
@@ -127,71 +127,71 @@ lm_eval \
127
  <tr>
128
  <td>MMLU (5-shot)
129
  </td>
130
- <td>77.77
131
  </td>
132
- <td>76.08
133
  </td>
134
- <td>97.82%
135
  </td>
136
  </tr>
137
  <tr>
138
  <td>ARC Challenge (25-shot)
139
  </td>
140
- <td>72.70
141
  </td>
142
- <td>72.53
143
  </td>
144
- <td>99.76%
145
  </td>
146
  </tr>
147
  <tr>
148
  <td>GSM-8K (5-shot, strict-match)
149
  </td>
150
- <td>82.03
151
  </td>
152
- <td>83.40
153
  </td>
154
- <td>101.6%
155
  </td>
156
  </tr>
157
  <tr>
158
  <td>Hellaswag (10-shot)
159
  </td>
160
- <td>89.08
161
  </td>
162
- <td>88.10
163
  </td>
164
- <td>98.89%
165
  </td>
166
  </tr>
167
  <tr>
168
  <td>Winogrande (5-shot)
169
  </td>
170
- <td>85.16
171
  </td>
172
- <td>84.37
173
  </td>
174
- <td>99.07%
175
  </td>
176
  </tr>
177
  <tr>
178
  <td>TruthfulQA (0-shot)
179
  </td>
180
- <td>68.14
181
  </td>
182
- <td>66.32
183
  </td>
184
- <td>97.32%
185
  </td>
186
  </tr>
187
  <tr>
188
  <td><strong>Average</strong>
189
  </td>
190
- <td><strong>79.15</strong>
191
  </td>
192
- <td><strong>78.47</strong>
193
  </td>
194
- <td><strong>99.14%</strong>
195
  </td>
196
  </tr>
197
  </table>
 
14
  - **Model Optimizations:**
15
  - **Weight quantization:** FP8
16
  - **Activation quantization:** FP8
17
+ - **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), this models is intended for assistant-like chat.
18
  - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
19
+ - **Release Date:** 8/11/2024
20
+ - **Version:** 1.1
21
  - **License(s):** [apache-2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md)
22
  - **Model Developers:** Neural Magic
23
 
24
  Quantized version of [Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1).
25
+ It achieves an average score of 79.04 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 79.93.
26
 
27
  ### Model Optimizations
28
 
 
88
 
89
  quantize_config = BaseQuantizeConfig(
90
  quant_method="fp8",
91
+ activation_scheme="static",
92
  ignore_patterns=["re:.*lm_head", "re:.*block_sparse_moe.gate"],
93
  )
94
 
 
105
  ```
106
  lm_eval \
107
  --model vllm \
108
+ --model_args pretrained="neuralmagic/Mixtral-8x22B-Instruct-v0.1-FP8",tensor_parallel_size=4,dtype=auto,gpu_memory_utilization=0.8,add_bos_token=True,max_model_len=4096 \
109
  --tasks openllm \
110
  --batch_size auto
111
  ```
 
127
  <tr>
128
  <td>MMLU (5-shot)
129
  </td>
130
+ <td>77.71
131
  </td>
132
+ <td>77.03
133
  </td>
134
+ <td>99.12%
135
  </td>
136
  </tr>
137
  <tr>
138
  <td>ARC Challenge (25-shot)
139
  </td>
140
+ <td>73.38
141
  </td>
142
+ <td>73.04
143
  </td>
144
+ <td>99.54%
145
  </td>
146
  </tr>
147
  <tr>
148
  <td>GSM-8K (5-shot, strict-match)
149
  </td>
150
+ <td>84.99
151
  </td>
152
+ <td>83.62
153
  </td>
154
+ <td>98.39%
155
  </td>
156
  </tr>
157
  <tr>
158
  <td>Hellaswag (10-shot)
159
  </td>
160
+ <td>89.24
161
  </td>
162
+ <td>88.22
163
  </td>
164
+ <td>98.86%
165
  </td>
166
  </tr>
167
  <tr>
168
  <td>Winogrande (5-shot)
169
  </td>
170
+ <td>85.87
171
  </td>
172
+ <td>84.93
173
  </td>
174
+ <td>98.91%
175
  </td>
176
  </tr>
177
  <tr>
178
  <td>TruthfulQA (0-shot)
179
  </td>
180
+ <td>68.41
181
  </td>
182
+ <td>67.37
183
  </td>
184
+ <td>98.48%
185
  </td>
186
  </tr>
187
  <tr>
188
  <td><strong>Average</strong>
189
  </td>
190
+ <td><strong>79.93</strong>
191
  </td>
192
+ <td><strong>79.04</strong>
193
  </td>
194
+ <td><strong>98.88%</strong>
195
  </td>
196
  </tr>
197
  </table>