update readme
Browse files
README.md
CHANGED
@@ -281,25 +281,22 @@ Note: Due to rounding errors caused by hardware and framework, differences in re
|
|
281 |
|
282 |
#### C-Eval
|
283 |
|
284 |
-
在[C-Eval](https://arxiv.org/abs/2305.08322)验证集上,我们评价了Qwen-1.8B-Chat
|
285 |
-
|
286 |
-
We demonstrate the
|
287 |
-
|
288 |
-
| Model
|
289 |
-
|
290 |
-
|
|
291 |
-
|
|
292 |
-
|
|
293 |
-
|
|
294 |
-
|
|
295 |
-
|
|
296 |
-
|
|
297 |
-
|
|
298 |
-
|
|
299 |
-
|
|
300 |
-
| Firefly-Bloom-1B4 | 23.6 |
|
301 |
-
| OpenBuddy-3B | 23.5 |
|
302 |
-
| RedPajama-INCITE-Chat-3B | 18.3 |
|
303 |
|
304 |
C-Eval测试集上,Qwen-1.8B-Chat模型的zero-shot准确率结果如下:
|
305 |
|
@@ -307,35 +304,35 @@ The zero-shot accuracy of Qwen-1.8B-Chat on C-Eval testing set is provided below
|
|
307 |
|
308 |
| Model | Avg. | STEM | Social Sciences | Humanities | Others |
|
309 |
| :---------------------: | :------: | :--: | :-------------: | :--------: | :----: |
|
310 |
-
| **Qwen-7B-Chat** | 54.6 | 47.8 | 67.6 | 59.3 | 50.6 |
|
311 |
-
| Baichuan-13B-Chat | 51.5 | 43.7 | 64.6 | 56.2 | 49.2 |
|
312 |
-
| ChatGLM2-6B-Chat | 50.1 | 46.4 | 60.4 | 50.6 | 46.9 |
|
313 |
-
| **Qwen-1.8B-Chat** | 53.8 | 48.4 | 68.0 | 56.5 | 48.3 |
|
314 |
| Chinese-Alpaca-Plus-13B | 41.5 | 36.6 | 49.7 | 43.1 | 41.2 |
|
315 |
| Chinese-Alpaca-2-7B | 40.3 | - | - | - | - |
|
|
|
|
|
|
|
|
|
316 |
|
317 |
### 英文评测(English Evaluation)
|
318 |
|
319 |
#### MMLU
|
320 |
|
321 |
-
[MMLU](https://arxiv.org/abs/2009.03300)评测集上,Qwen-1.8B-Chat
|
322 |
|
323 |
-
The
|
324 |
The performance of Qwen-1.8B-Chat still on the top between other human-aligned models with comparable size.
|
325 |
|
326 |
-
| Model
|
327 |
-
|
328 |
-
|
|
329 |
-
|
|
330 |
-
|
|
331 |
-
|
|
332 |
-
|
|
333 |
-
|
|
334 |
-
|
|
335 |
-
|
|
336 |
-
|
|
337 |
-
|
|
338 |
-
|
|
339 |
|
340 |
### 代码评测(Coding Evaluation)
|
341 |
|
@@ -345,16 +342,16 @@ The zero-shot Pass@1 of Qwen-1.8B-Chat on [HumanEval](https://github.com/openai/
|
|
345 |
|
346 |
| Model | Pass@1 |
|
347 |
|:------------------------:|:------:|
|
348 |
-
| **Qwen-7B-Chat** | 24.4 |
|
349 |
-
| LLaMA2-13B-Chat | 18.9 |
|
350 |
-
| Baichuan-13B-Chat | 16.5 |
|
351 |
-
| InternLM-7B-Chat | 14.0 |
|
352 |
-
| LLaMA2-7B-Chat | 12.2 |
|
353 |
-
| **Qwen-1.8B-Chat** | 26.2 |
|
354 |
-
| OpenBuddy-3B | 10.4 |
|
355 |
-
| RedPajama-INCITE-Chat-3B | 6.1 |
|
356 |
-
| OpenLLaMA-Chinese-3B | 4.9 |
|
357 |
| Firefly-Bloom-1B4 | 0.6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
358 |
|
359 |
### 数学评测(Mathematics Evaluation)
|
360 |
|
@@ -362,20 +359,19 @@ The zero-shot Pass@1 of Qwen-1.8B-Chat on [HumanEval](https://github.com/openai/
|
|
362 |
|
363 |
The accuracy of Qwen-1.8B-Chat on GSM8K is shown below
|
364 |
|
365 |
-
|
|
366 |
-
|
367 |
-
|
|
368 |
-
|
|
369 |
-
|
|
370 |
-
|
|
371 |
-
|
|
372 |
-
|
|
373 |
-
|
|
374 |
-
|
|
375 |
-
|
|
376 |
-
|
|
377 |
-
|
|
378 |
-
| Firefly-Bloom-1B4 | 2.4 | 1.8 |
|
379 |
|
380 |
## 评测复现(Reproduction)
|
381 |
|
|
|
281 |
|
282 |
#### C-Eval
|
283 |
|
284 |
+
在[C-Eval](https://arxiv.org/abs/2305.08322)验证集上,我们评价了Qwen-1.8B-Chat模型的准确率
|
285 |
+
|
286 |
+
We demonstrate the accuracy of Qwen-1.8B-Chat on C-Eval validation set
|
287 |
+
|
288 |
+
| Model | Acc. |
|
289 |
+
|:--------------------------------:|:---------:|
|
290 |
+
| RedPajama-INCITE-Chat-3B | 18.3 |
|
291 |
+
| OpenBuddy-3B | 23.5 |
|
292 |
+
| Firefly-Bloom-1B4 | 23.6 |
|
293 |
+
| OpenLLaMA-Chinese-3B | 24.4 |
|
294 |
+
| LLaMA2-7B-Chat | 31.9 |
|
295 |
+
| ChatGLM2-6B-Chat | 52.6 |
|
296 |
+
| InternLM-7B-Chat | 53.6 |
|
297 |
+
| **Qwen-1.8B-Chat (0-shot)** | 55.6 |
|
298 |
+
| **Qwen-7B-Chat (0-shot)** | 59.7 |
|
299 |
+
| **Qwen-7B-Chat (5-shot)** | 59.3 |
|
|
|
|
|
|
|
300 |
|
301 |
C-Eval测试集上,Qwen-1.8B-Chat模型的zero-shot准确率结果如下:
|
302 |
|
|
|
304 |
|
305 |
| Model | Avg. | STEM | Social Sciences | Humanities | Others |
|
306 |
| :---------------------: | :------: | :--: | :-------------: | :--------: | :----: |
|
|
|
|
|
|
|
|
|
307 |
| Chinese-Alpaca-Plus-13B | 41.5 | 36.6 | 49.7 | 43.1 | 41.2 |
|
308 |
| Chinese-Alpaca-2-7B | 40.3 | - | - | - | - |
|
309 |
+
| ChatGLM2-6B-Chat | 50.1 | 46.4 | 60.4 | 50.6 | 46.9 |
|
310 |
+
| Baichuan-13B-Chat | 51.5 | 43.7 | 64.6 | 56.2 | 49.2 |
|
311 |
+
| **Qwen-1.8B-Chat** | 53.8 | 48.4 | 68.0 | 56.5 | 48.3 |
|
312 |
+
| **Qwen-7B-Chat** | 58.6 | 53.3 | 72.1 | 62.8 | 52.0 |
|
313 |
|
314 |
### 英文评测(English Evaluation)
|
315 |
|
316 |
#### MMLU
|
317 |
|
318 |
+
[MMLU](https://arxiv.org/abs/2009.03300)评测集上,Qwen-1.8B-Chat模型的准确率如下,效果同样在同类对齐模型中同样表现较优。
|
319 |
|
320 |
+
The accuracy of Qwen-1.8B-Chat on MMLU is provided below.
|
321 |
The performance of Qwen-1.8B-Chat still on the top between other human-aligned models with comparable size.
|
322 |
|
323 |
+
| Model | Acc. |
|
324 |
+
|:--------------------------------:|:---------:|
|
325 |
+
| Firefly-Bloom-1B4 | 23.8 |
|
326 |
+
| OpenBuddy-3B | 25.5 |
|
327 |
+
| RedPajama-INCITE-Chat-3B | 25.5 |
|
328 |
+
| OpenLLaMA-Chinese-3B | 25.7 |
|
329 |
+
| ChatGLM2-6B-Chat | 46.0 |
|
330 |
+
| LLaMA2-7B-Chat | 46.2 |
|
331 |
+
| InternLM-7B-Chat | 51.1 |
|
332 |
+
| Baichuan2-7B-Chat | 52.9 |
|
333 |
+
| **Qwen-1.8B-Chat (0-shot)** | 43.3 |
|
334 |
+
| **Qwen-7B-Chat (0-shot)** | 55.8 |
|
335 |
+
| **Qwen-7B-Chat (5-shot)** | 57.0 |
|
336 |
|
337 |
### 代码评测(Coding Evaluation)
|
338 |
|
|
|
342 |
|
343 |
| Model | Pass@1 |
|
344 |
|:------------------------:|:------:|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
345 |
| Firefly-Bloom-1B4 | 0.6 |
|
346 |
+
| OpenLLaMA-Chinese-3B | 4.9 |
|
347 |
+
| RedPajama-INCITE-Chat-3B | 6.1 |
|
348 |
+
| OpenBuddy-3B | 10.4 |
|
349 |
+
| ChatGLM2-6B-Chat | 11.0 |
|
350 |
+
| LLaMA2-7B-Chat | 12.2 |
|
351 |
+
| Baichuan2-7B-Chat | 13.4 |
|
352 |
+
| InternLM-7B-Chat | 14.6 |
|
353 |
+
| **Qwen-1.8B-Chat** | 26.2 |
|
354 |
+
| **Qwen-7B-Chat** | 37.2 |
|
355 |
|
356 |
### 数学评测(Mathematics Evaluation)
|
357 |
|
|
|
359 |
|
360 |
The accuracy of Qwen-1.8B-Chat on GSM8K is shown below
|
361 |
|
362 |
+
| Model | Acc. |
|
363 |
+
|:------------------------------------:|:--------:|
|
364 |
+
| Firefly-Bloom-1B4 | 2.4 |
|
365 |
+
| RedPajama-INCITE-Chat-3B | 2.5 |
|
366 |
+
| OpenLLaMA-Chinese-3B | 3.0 |
|
367 |
+
| OpenBuddy-3B | 12.6 |
|
368 |
+
| LLaMA2-7B-Chat | 26.3 |
|
369 |
+
| ChatGLM2-6B-Chat | 28.8 |
|
370 |
+
| Baichuan2-7B-Chat | 32.8 |
|
371 |
+
| InternLM-7B-Chat | 33.0 |
|
372 |
+
| **Qwen-1.8B-Chat (0-shot)** | 33.7 |
|
373 |
+
| **Qwen-7B-Chat (0-shot)** | 50.3 |
|
374 |
+
| **Qwen-7B-Chat (8-shot)** | 54.1 |
|
|
|
375 |
|
376 |
## 评测复现(Reproduction)
|
377 |
|