nm-research
commited on
Commit
•
4329992
1
Parent(s):
11dac13
Update README.md
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ tags:
|
|
27 |
- **Model Developers:** Neural Magic
|
28 |
|
29 |
Quantized version of [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
|
30 |
-
It achieves an average score of
|
31 |
|
32 |
### Model Optimizations
|
33 |
|
@@ -97,39 +97,39 @@ lm_eval \
|
|
97 |
</td>
|
98 |
<td>MMLU (5-shot)
|
99 |
</td>
|
100 |
-
<td>
|
101 |
</td>
|
102 |
-
<td>
|
103 |
</td>
|
104 |
-
<td>99.
|
105 |
</td>
|
106 |
</tr>
|
107 |
<tr>
|
108 |
<td>ARC Challenge (25-shot)
|
109 |
</td>
|
110 |
-
<td>
|
111 |
</td>
|
112 |
-
<td>
|
113 |
</td>
|
114 |
-
<td>
|
115 |
</td>
|
116 |
</tr>
|
117 |
<tr>
|
118 |
<td>GSM-8K (5-shot, strict-match)
|
119 |
</td>
|
120 |
-
<td>
|
121 |
</td>
|
122 |
-
<td>
|
123 |
</td>
|
124 |
-
<td>
|
125 |
</td>
|
126 |
</tr>
|
127 |
<tr>
|
128 |
<td>Hellaswag (10-shot)
|
129 |
</td>
|
130 |
-
<td>
|
131 |
</td>
|
132 |
-
<td>
|
133 |
</td>
|
134 |
<td>99.4%
|
135 |
</td>
|
@@ -137,31 +137,31 @@ lm_eval \
|
|
137 |
<tr>
|
138 |
<td>Winogrande (5-shot)
|
139 |
</td>
|
140 |
-
<td>
|
141 |
</td>
|
142 |
-
<td>
|
143 |
</td>
|
144 |
-
<td>
|
145 |
</td>
|
146 |
</tr>
|
147 |
<tr>
|
148 |
<td>TruthfulQA (0-shot, mc2)
|
149 |
</td>
|
150 |
-
<td>
|
151 |
</td>
|
152 |
-
<td>
|
153 |
</td>
|
154 |
-
<td>
|
155 |
</td>
|
156 |
</tr>
|
157 |
<tr>
|
158 |
<td><strong>Average</strong>
|
159 |
</td>
|
160 |
-
<td><strong>
|
161 |
</td>
|
162 |
-
<td><strong>
|
163 |
</td>
|
164 |
-
<td><strong>
|
165 |
</td>
|
166 |
</tr>
|
167 |
<tr>
|
|
|
27 |
- **Model Developers:** Neural Magic
|
28 |
|
29 |
Quantized version of [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
|
30 |
+
It achieves an average score of 73.05 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark version 1 and 41.44 on version 2, whereas the unquantized model achieves 73.16 on version 1 and 41.40 on version 2.
|
31 |
|
32 |
### Model Optimizations
|
33 |
|
|
|
97 |
</td>
|
98 |
<td>MMLU (5-shot)
|
99 |
</td>
|
100 |
+
<td>74.24
|
101 |
</td>
|
102 |
+
<td>73.84
|
103 |
</td>
|
104 |
+
<td>99.5%
|
105 |
</td>
|
106 |
</tr>
|
107 |
<tr>
|
108 |
<td>ARC Challenge (25-shot)
|
109 |
</td>
|
110 |
+
<td>63.40
|
111 |
</td>
|
112 |
+
<td>63.23
|
113 |
</td>
|
114 |
+
<td>99.7%
|
115 |
</td>
|
116 |
</tr>
|
117 |
<tr>
|
118 |
<td>GSM-8K (5-shot, strict-match)
|
119 |
</td>
|
120 |
+
<td>80.36
|
121 |
</td>
|
122 |
+
<td>80.74
|
123 |
</td>
|
124 |
+
<td>100.5%
|
125 |
</td>
|
126 |
</tr>
|
127 |
<tr>
|
128 |
<td>Hellaswag (10-shot)
|
129 |
</td>
|
130 |
+
<td>81.52
|
131 |
</td>
|
132 |
+
<td>81.06
|
133 |
</td>
|
134 |
<td>99.4%
|
135 |
</td>
|
|
|
137 |
<tr>
|
138 |
<td>Winogrande (5-shot)
|
139 |
</td>
|
140 |
+
<td>74.66
|
141 |
</td>
|
142 |
+
<td>74.82
|
143 |
</td>
|
144 |
+
<td>100.2%
|
145 |
</td>
|
146 |
</tr>
|
147 |
<tr>
|
148 |
<td>TruthfulQA (0-shot, mc2)
|
149 |
</td>
|
150 |
+
<td>64.76
|
151 |
</td>
|
152 |
+
<td>64.58
|
153 |
</td>
|
154 |
+
<td>99.7%
|
155 |
</td>
|
156 |
</tr>
|
157 |
<tr>
|
158 |
<td><strong>Average</strong>
|
159 |
</td>
|
160 |
+
<td><strong>73.16</strong>
|
161 |
</td>
|
162 |
+
<td><strong>73.05</strong>
|
163 |
</td>
|
164 |
+
<td><strong>99.9%</strong>
|
165 |
</td>
|
166 |
</tr>
|
167 |
<tr>
|