Lin-K76 commited on
Commit
a0a38dd
·
verified ·
1 Parent(s): 8006c32

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -8
README.md CHANGED
@@ -25,7 +25,7 @@ language:
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
28
- It achieves an average score of 68.77 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 69.33.
29
 
30
  ### Model Optimizations
31
 
@@ -130,6 +130,7 @@ oneshot(
130
  ## Evaluation
131
 
132
  The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command:
 
133
  ```
134
  lm_eval \
135
  --model vllm \
@@ -163,13 +164,13 @@ lm_eval \
163
  </td>
164
  </tr>
165
  <tr>
166
- <td>ARC Challenge (25-shot)
167
  </td>
168
- <td>60.41
169
  </td>
170
- <td>60.24
171
  </td>
172
- <td>99.72%
173
  </td>
174
  </tr>
175
  <tr>
@@ -215,11 +216,11 @@ lm_eval \
215
  <tr>
216
  <td><strong>Average</strong>
217
  </td>
218
- <td><strong>69.33</strong>
219
  </td>
220
- <td><strong>68.77</strong>
221
  </td>
222
- <td><strong>99.20%</strong>
223
  </td>
224
  </tr>
225
  </table>
 
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
28
+ It achieves an average score of 72.46 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 73.11.
29
 
30
  ### Model Optimizations
31
 
 
130
  ## Evaluation
131
 
132
  The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command:
133
+ A modified version of ARC-C was used for evaluations, in line with Llama 3.1's prompting.
134
  ```
135
  lm_eval \
136
  --model vllm \
 
164
  </td>
165
  </tr>
166
  <tr>
167
+ <td>ARC Challenge (0-shot)
168
  </td>
169
+ <td>83.11
170
  </td>
171
+ <td>82.34
172
  </td>
173
+ <td>99.07%
174
  </td>
175
  </tr>
176
  <tr>
 
216
  <tr>
217
  <td><strong>Average</strong>
218
  </td>
219
+ <td><strong>73.11</strong>
220
  </td>
221
+ <td><strong>72.46</strong>
222
  </td>
223
+ <td><strong>99.10%</strong>
224
  </td>
225
  </tr>
226
  </table>