update readme
Browse files
README.md
CHANGED
@@ -256,25 +256,22 @@ Note: Due to rounding errors caused by hardware and framework, differences in re
|
|
256 |
|
257 |
#### C-Eval
|
258 |
|
259 |
-
在[C-Eval](https://arxiv.org/abs/2305.08322)验证集上,我们评价了Qwen-1.8B-Chat
|
260 |
-
|
261 |
-
We demonstrate the
|
262 |
-
|
263 |
-
| Model
|
264 |
-
|
265 |
-
|
|
266 |
-
|
|
267 |
-
|
|
268 |
-
|
|
269 |
-
|
|
270 |
-
|
|
271 |
-
|
|
272 |
-
|
|
273 |
-
|
|
274 |
-
|
|
275 |
-
| Firefly-Bloom-1B4 | 23.6 |
|
276 |
-
| OpenBuddy-3B | 23.5 |
|
277 |
-
| RedPajama-INCITE-Chat-3B | 18.3 |
|
278 |
|
279 |
C-Eval测试集上,Qwen-1.8B-Chat模型的zero-shot准确率结果如下:
|
280 |
|
@@ -282,35 +279,35 @@ The zero-shot accuracy of Qwen-1.8B-Chat on C-Eval testing set is provided below
|
|
282 |
|
283 |
| Model | Avg. | STEM | Social Sciences | Humanities | Others |
|
284 |
| :---------------------: | :------: | :--: | :-------------: | :--------: | :----: |
|
285 |
-
| **Qwen-7B-Chat** | 54.6 | 47.8 | 67.6 | 59.3 | 50.6 |
|
286 |
-
| Baichuan-13B-Chat | 51.5 | 43.7 | 64.6 | 56.2 | 49.2 |
|
287 |
-
| ChatGLM2-6B-Chat | 50.1 | 46.4 | 60.4 | 50.6 | 46.9 |
|
288 |
-
| **Qwen-1.8B-Chat** | 53.8 | 48.4 | 68.0 | 56.5 | 48.3 |
|
289 |
| Chinese-Alpaca-Plus-13B | 41.5 | 36.6 | 49.7 | 43.1 | 41.2 |
|
290 |
| Chinese-Alpaca-2-7B | 40.3 | - | - | - | - |
|
|
|
|
|
|
|
|
|
291 |
|
292 |
### 英文评测(English Evaluation)
|
293 |
|
294 |
#### MMLU
|
295 |
|
296 |
-
[MMLU](https://arxiv.org/abs/2009.03300)评测集上,Qwen-1.8B-Chat
|
297 |
|
298 |
-
The
|
299 |
The performance of Qwen-1.8B-Chat still on the top between other human-aligned models with comparable size.
|
300 |
|
301 |
-
| Model
|
302 |
-
|
303 |
-
|
|
304 |
-
|
|
305 |
-
|
|
306 |
-
|
|
307 |
-
|
|
308 |
-
|
|
309 |
-
|
|
310 |
-
|
|
311 |
-
|
|
312 |
-
|
|
313 |
-
|
|
314 |
|
315 |
### 代码评测(Coding Evaluation)
|
316 |
|
@@ -320,16 +317,16 @@ The zero-shot Pass@1 of Qwen-1.8B-Chat on [HumanEval](https://github.com/openai/
|
|
320 |
|
321 |
| Model | Pass@1 |
|
322 |
|:------------------------:|:------:|
|
323 |
-
| **Qwen-7B-Chat** | 24.4 |
|
324 |
-
| LLaMA2-13B-Chat | 18.9 |
|
325 |
-
| Baichuan-13B-Chat | 16.5 |
|
326 |
-
| InternLM-7B-Chat | 14.0 |
|
327 |
-
| LLaMA2-7B-Chat | 12.2 |
|
328 |
-
| **Qwen-1.8B-Chat** | 26.2 |
|
329 |
-
| OpenBuddy-3B | 10.4 |
|
330 |
-
| RedPajama-INCITE-Chat-3B | 6.1 |
|
331 |
-
| OpenLLaMA-Chinese-3B | 4.9 |
|
332 |
| Firefly-Bloom-1B4 | 0.6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
333 |
|
334 |
### 数学评测(Mathematics Evaluation)
|
335 |
|
@@ -337,20 +334,19 @@ The zero-shot Pass@1 of Qwen-1.8B-Chat on [HumanEval](https://github.com/openai/
|
|
337 |
|
338 |
The accuracy of Qwen-1.8B-Chat on GSM8K is shown below
|
339 |
|
340 |
-
|
|
341 |
-
|
342 |
-
|
|
343 |
-
|
|
344 |
-
|
|
345 |
-
|
|
346 |
-
|
|
347 |
-
|
|
348 |
-
|
|
349 |
-
|
|
350 |
-
|
|
351 |
-
|
|
352 |
-
|
|
353 |
-
| Firefly-Bloom-1B4 | 2.4 | 1.8 |
|
354 |
|
355 |
## 评测复现(Reproduction)
|
356 |
|
|
|
256 |
|
257 |
#### C-Eval
|
258 |
|
259 |
+
在[C-Eval](https://arxiv.org/abs/2305.08322)验证集上,我们评价了Qwen-1.8B-Chat模型的准确率
|
260 |
+
|
261 |
+
We demonstrate the accuracy of Qwen-1.8B-Chat on C-Eval validation set
|
262 |
+
|
263 |
+
| Model | Acc. |
|
264 |
+
|:--------------------------------:|:---------:|
|
265 |
+
| RedPajama-INCITE-Chat-3B | 18.3 |
|
266 |
+
| OpenBuddy-3B | 23.5 |
|
267 |
+
| Firefly-Bloom-1B4 | 23.6 |
|
268 |
+
| OpenLLaMA-Chinese-3B | 24.4 |
|
269 |
+
| LLaMA2-7B-Chat | 31.9 |
|
270 |
+
| ChatGLM2-6B-Chat | 52.6 |
|
271 |
+
| InternLM-7B-Chat | 53.6 |
|
272 |
+
| **Qwen-1.8B-Chat (0-shot)** | 55.6 |
|
273 |
+
| **Qwen-7B-Chat (0-shot)** | 59.7 |
|
274 |
+
| **Qwen-7B-Chat (5-shot)** | 59.3 |
|
|
|
|
|
|
|
275 |
|
276 |
C-Eval测试集上,Qwen-1.8B-Chat模型的zero-shot准确率结果如下:
|
277 |
|
|
|
279 |
|
280 |
| Model | Avg. | STEM | Social Sciences | Humanities | Others |
|
281 |
| :---------------------: | :------: | :--: | :-------------: | :--------: | :----: |
|
|
|
|
|
|
|
|
|
282 |
| Chinese-Alpaca-Plus-13B | 41.5 | 36.6 | 49.7 | 43.1 | 41.2 |
|
283 |
| Chinese-Alpaca-2-7B | 40.3 | - | - | - | - |
|
284 |
+
| ChatGLM2-6B-Chat | 50.1 | 46.4 | 60.4 | 50.6 | 46.9 |
|
285 |
+
| Baichuan-13B-Chat | 51.5 | 43.7 | 64.6 | 56.2 | 49.2 |
|
286 |
+
| **Qwen-1.8B-Chat** | 53.8 | 48.4 | 68.0 | 56.5 | 48.3 |
|
287 |
+
| **Qwen-7B-Chat** | 58.6 | 53.3 | 72.1 | 62.8 | 52.0 |
|
288 |
|
289 |
### 英文评测(English Evaluation)
|
290 |
|
291 |
#### MMLU
|
292 |
|
293 |
+
[MMLU](https://arxiv.org/abs/2009.03300)评测集上,Qwen-1.8B-Chat模型的准确率如下,效果同样在同类对齐模型中同样表现较优。
|
294 |
|
295 |
+
The accuracy of Qwen-1.8B-Chat on MMLU is provided below.
|
296 |
The performance of Qwen-1.8B-Chat still on the top between other human-aligned models with comparable size.
|
297 |
|
298 |
+
| Model | Acc. |
|
299 |
+
|:--------------------------------:|:---------:|
|
300 |
+
| Firefly-Bloom-1B4 | 23.8 |
|
301 |
+
| OpenBuddy-3B | 25.5 |
|
302 |
+
| RedPajama-INCITE-Chat-3B | 25.5 |
|
303 |
+
| OpenLLaMA-Chinese-3B | 25.7 |
|
304 |
+
| ChatGLM2-6B-Chat | 46.0 |
|
305 |
+
| LLaMA2-7B-Chat | 46.2 |
|
306 |
+
| InternLM-7B-Chat | 51.1 |
|
307 |
+
| Baichuan2-7B-Chat | 52.9 |
|
308 |
+
| **Qwen-1.8B-Chat (0-shot)** | 43.3 |
|
309 |
+
| **Qwen-7B-Chat (0-shot)** | 55.8 |
|
310 |
+
| **Qwen-7B-Chat (5-shot)** | 57.0 |
|
311 |
|
312 |
### 代码评测(Coding Evaluation)
|
313 |
|
|
|
317 |
|
318 |
| Model | Pass@1 |
|
319 |
|:------------------------:|:------:|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
320 |
| Firefly-Bloom-1B4 | 0.6 |
|
321 |
+
| OpenLLaMA-Chinese-3B | 4.9 |
|
322 |
+
| RedPajama-INCITE-Chat-3B | 6.1 |
|
323 |
+
| OpenBuddy-3B | 10.4 |
|
324 |
+
| ChatGLM2-6B-Chat | 11.0 |
|
325 |
+
| LLaMA2-7B-Chat | 12.2 |
|
326 |
+
| Baichuan2-7B-Chat | 13.4 |
|
327 |
+
| InternLM-7B-Chat | 14.6 |
|
328 |
+
| **Qwen-1.8B-Chat** | 26.2 |
|
329 |
+
| **Qwen-7B-Chat** | 37.2 |
|
330 |
|
331 |
### 数学评测(Mathematics Evaluation)
|
332 |
|
|
|
334 |
|
335 |
The accuracy of Qwen-1.8B-Chat on GSM8K is shown below
|
336 |
|
337 |
+
| Model | Acc. |
|
338 |
+
|:------------------------------------:|:--------:|
|
339 |
+
| Firefly-Bloom-1B4 | 2.4 |
|
340 |
+
| RedPajama-INCITE-Chat-3B | 2.5 |
|
341 |
+
| OpenLLaMA-Chinese-3B | 3.0 |
|
342 |
+
| OpenBuddy-3B | 12.6 |
|
343 |
+
| LLaMA2-7B-Chat | 26.3 |
|
344 |
+
| ChatGLM2-6B-Chat | 28.8 |
|
345 |
+
| Baichuan2-7B-Chat | 32.8 |
|
346 |
+
| InternLM-7B-Chat | 33.0 |
|
347 |
+
| **Qwen-1.8B-Chat (0-shot)** | 33.7 |
|
348 |
+
| **Qwen-7B-Chat (0-shot)** | 50.3 |
|
349 |
+
| **Qwen-7B-Chat (8-shot)** | 54.1 |
|
|
|
350 |
|
351 |
## 评测复现(Reproduction)
|
352 |
|