nm-research commited on
Commit
4329992
1 Parent(s): 11dac13

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -21
README.md CHANGED
@@ -27,7 +27,7 @@ tags:
27
  - **Model Developers:** Neural Magic
28
 
29
  Quantized version of [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
30
- It achieves an average score of 58.80 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark version 1 and 35.60 on version 2, whereas the unquantized model achieves 57.50 on version 1 and 35.85 on version 2.
31
 
32
  ### Model Optimizations
33
 
@@ -97,39 +97,39 @@ lm_eval \
97
  </td>
98
  <td>MMLU (5-shot)
99
  </td>
100
- <td>66.27
101
  </td>
102
- <td>65.61
103
  </td>
104
- <td>99.0%
105
  </td>
106
  </tr>
107
  <tr>
108
  <td>ARC Challenge (25-shot)
109
  </td>
110
- <td>56.91
111
  </td>
112
- <td>57.25
113
  </td>
114
- <td>100.6%
115
  </td>
116
  </tr>
117
  <tr>
118
  <td>GSM-8K (5-shot, strict-match)
119
  </td>
120
- <td>17.29
121
  </td>
122
- <td>28.13
123
  </td>
124
- <td>162.7%
125
  </td>
126
  </tr>
127
  <tr>
128
  <td>Hellaswag (10-shot)
129
  </td>
130
- <td>75.19
131
  </td>
132
- <td>74.76
133
  </td>
134
  <td>99.4%
135
  </td>
@@ -137,31 +137,31 @@ lm_eval \
137
  <tr>
138
  <td>Winogrande (5-shot)
139
  </td>
140
- <td>70.48
141
  </td>
142
- <td>69.30
143
  </td>
144
- <td>98.3%
145
  </td>
146
  </tr>
147
  <tr>
148
  <td>TruthfulQA (0-shot, mc2)
149
  </td>
150
- <td>58.84
151
  </td>
152
- <td>57.73
153
  </td>
154
- <td>101.0%
155
  </td>
156
  </tr>
157
  <tr>
158
  <td><strong>Average</strong>
159
  </td>
160
- <td><strong>57.50</strong>
161
  </td>
162
- <td><strong>58.80</strong>
163
  </td>
164
- <td><strong>102.3%</strong>
165
  </td>
166
  </tr>
167
  <tr>
 
27
  - **Model Developers:** Neural Magic
28
 
29
  Quantized version of [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
30
+ It achieves an average score of 73.05 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark version 1 and 41.44 on version 2, whereas the unquantized model achieves 73.16 on version 1 and 41.40 on version 2.
31
 
32
  ### Model Optimizations
33
 
 
97
  </td>
98
  <td>MMLU (5-shot)
99
  </td>
100
+ <td>74.24
101
  </td>
102
+ <td>73.84
103
  </td>
104
+ <td>99.5%
105
  </td>
106
  </tr>
107
  <tr>
108
  <td>ARC Challenge (25-shot)
109
  </td>
110
+ <td>63.40
111
  </td>
112
+ <td>63.23
113
  </td>
114
+ <td>99.7%
115
  </td>
116
  </tr>
117
  <tr>
118
  <td>GSM-8K (5-shot, strict-match)
119
  </td>
120
+ <td>80.36
121
  </td>
122
+ <td>80.74
123
  </td>
124
+ <td>100.5%
125
  </td>
126
  </tr>
127
  <tr>
128
  <td>Hellaswag (10-shot)
129
  </td>
130
+ <td>81.52
131
  </td>
132
+ <td>81.06
133
  </td>
134
  <td>99.4%
135
  </td>
 
137
  <tr>
138
  <td>Winogrande (5-shot)
139
  </td>
140
+ <td>74.66
141
  </td>
142
+ <td>74.82
143
  </td>
144
+ <td>100.2%
145
  </td>
146
  </tr>
147
  <tr>
148
  <td>TruthfulQA (0-shot, mc2)
149
  </td>
150
+ <td>64.76
151
  </td>
152
+ <td>64.58
153
  </td>
154
+ <td>99.7%
155
  </td>
156
  </tr>
157
  <tr>
158
  <td><strong>Average</strong>
159
  </td>
160
+ <td><strong>73.16</strong>
161
  </td>
162
+ <td><strong>73.05</strong>
163
  </td>
164
+ <td><strong>99.9%</strong>
165
  </td>
166
  </tr>
167
  <tr>