Text Generation
Transformers
Safetensors
Chinese
English
qwen
custom_code
yuyijiong commited on
Commit
40a52f1
1 Parent(s): 04cf66c

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +20 -18
  2. README_en.md +20 -17
README.md CHANGED
@@ -11,7 +11,7 @@ pipeline_tag: text-generation
11
  ---
12
  **Read this in other languages: [English](README_en.md), [中文](README.md).**
13
 
14
- * 2023.12.28更新:发布Qwen-7b-chat-yarn-32k,但注意,此版本显著弱于Qwen-14b-chat-yarn-32k
15
  * 2023.12.23更新:发布LongBench的passage_retrieval_en的评测结果
16
  * 2023.12.16更新:发布[论文(中文版)](https://cloud.tsinghua.edu.cn/d/5894ec4442e54a6aac96/)、[论文(英文版)](https://arxiv.org/abs/2312.11193)
17
  * 2023.12.14更新:发布经过微调的Qwen-14b-chat-yarn-32k,微调后的模型能适应32k长度(约4万汉字)的中英问答,相较于之前的通过位置插值得到的32k模型,几乎完全解决了多文档问答任务下召回率低(即 lost in middle 现象)的问题。
@@ -23,26 +23,28 @@ pipeline_tag: text-generation
23
 
24
  # LongBench测试结果
25
  ### LongBench的passage_retrieval_zh的评测结果
26
- | 模型 | 得分 (acc) |
27
- |---------------------------------------------|----------|
28
- | **Qwen-14b-chat-yarn-32k** |**0.94**|
29
- | gpt-3.5-turbo-16k | 0.81 |
30
- | chatglm3-32k | 0.725 |
31
- | Qwen-14b-chat (use_dynamic_ntk=True) | 0.525 |
32
- | Qwen-14b-chat-32k-lora | 0.34 |
33
- | Qwen-7b-chat-yarn-32k | 0.325 |
34
- | LongAlpaca-7b-32k-chinese-v2 | 0.12 |
35
- | CausalLM-14b | 0.086 |
 
36
 
37
 
38
  ### LongBench的passage_retrieval_en的评测结果
39
- | 模型 | 得分 (acc) |
40
- |----------------------------|----------|
41
- | **Qwen-14b-chat-yarn-32k** | **0.945** |
42
- | Qwen-14b-chat | 0.24 |
43
- | chatglm3-32k | 0.815 |
44
- | gpt-3.5-turbo-16k | 0.88 |
45
- | Qwen-7b-chat-yarn-32k |0.47|
 
46
 
47
  Qwen-14b-chat-yarn-32k经过微调后,在多文档问答(或检索)任务上提升非常显著,大幅领先其他同规模的模型。
48
 
 
11
  ---
12
  **Read this in other languages: [English](README_en.md), [中文](README.md).**
13
 
14
+ * 2023.12.28更新:发布Qwen-7b-chat-yarn-32k,但注意,可能由于模型规模偏小,基座模型能力弱,导致7b版本显著弱于Qwen-14b-chat-yarn-32k
15
  * 2023.12.23更新:发布LongBench的passage_retrieval_en的评测结果
16
  * 2023.12.16更新:发布[论文(中文版)](https://cloud.tsinghua.edu.cn/d/5894ec4442e54a6aac96/)、[论文(英文版)](https://arxiv.org/abs/2312.11193)
17
  * 2023.12.14更新:发布经过微调的Qwen-14b-chat-yarn-32k,微调后的模型能适应32k长度(约4万汉字)的中英问答,相较于之前的通过位置插值得到的32k模型,几乎完全解决了多文档问答任务下召回率低(即 lost in middle 现象)的问题。
 
23
 
24
  # LongBench测试结果
25
  ### LongBench的passage_retrieval_zh的评测结果
26
+ | 模型 | 得分 (acc) |
27
+ |------------------------------|------------|
28
+ | **Qwen-14b-chat-yarn-32k** | **0.94** |
29
+ | gpt-3.5-turbo-16k | 0.81 |
30
+ | chatglm3-32k | 0.725 |
31
+ | Qwen-14b-chat | 0.525 |
32
+ | Qwen-14b-chat-32k-lora | 0.34 |
33
+ | **Qwen-7b-chat-yarn-32k** | **0.325** |
34
+ | Qwen-7b-chat | 0.26 |
35
+ | LongAlpaca-7b-32k-chinese-v2 | 0.12 |
36
+ | CausalLM-14b | 0.086 |
37
 
38
 
39
  ### LongBench的passage_retrieval_en的评测结果
40
+ | 模型 | 得分 (acc) |
41
+ |-----------------------------|------------|
42
+ | **Qwen-14b-chat-yarn-32k** | **0.945** |
43
+ | chatglm3-32k | 0.815 |
44
+ | gpt-3.5-turbo-16k | 0.88 |
45
+ | **Qwen-7b-chat-yarn-32k** | **0.47** |
46
+ | Qwen-14b-chat | 0.24 |
47
+ | Qwen-7b-chat | 0.235 |
48
 
49
  Qwen-14b-chat-yarn-32k经过微调后,在多文档问答(或检索)任务上提升非常显著,大幅领先其他同规模的模型。
50
 
README_en.md CHANGED
@@ -11,6 +11,7 @@ pipeline_tag: text-generation
11
  ---
12
  **Read this in other languages: [English](README_en.md), [中文](README.md).**
13
 
 
14
  * Updated on December 23, 2023: Release the evaluation results of passage_retrieval_en in LongBench
15
  * Updated on December 16, 2023: Release [Paper](https://arxiv.org/abs/2312.11193)
16
  * Updated on December 14, 2023: We have released the Qwen-14b-chat-yarn-32k model, which has been fine-tuned to handle Chinese and English question-answering tasks with a length of up to 32k (approximately 40,000 Chinese characters). This model addresses the low recall issue in multi-document question-answering tasks (also known as the "lost in middle" phenomenon) that was present in the previous 32k model obtained through position interpolation. <br>
@@ -21,25 +22,27 @@ pipeline_tag: text-generation
21
  # Evaluation results in LongBench
22
  ### Evaluation results for passage_retrieval_zh in LongBench
23
 
24
- | Models | Accuracy |
25
- |--------------------------------|----------|
26
- | **Qwen-14b-chat-yarn-32k** | **0.94** |
27
- | gpt-3.5-turbo-16k | 0.81 |
28
- | chatglm3-32k | 0.725 |
29
- | Qwen-14b-chat | 0.525 |
30
- | Qwen-14b-chat-32k-lora | 0.34 |
31
- | Qwen-7b-chat-yarn-32k | 0.325 |
32
- | LongAlpaca-7b-32k-chinese-v2 | 0.12 |
33
- | CausalLM-14b | 0.086 |
 
34
 
35
  ### Evaluation results for passage_retrieval_en in LongBench
36
- | Models | Accuracy |
37
- |-----------------------------|-----------|
38
- | **Qwen-14b-chat-yarn-32k** | **0.945** |
39
- | chatglm3-32k | 0.815 |
40
- | gpt-3.5-turbo-16k | 0.88 |
41
- | Qwen-7b-chat-yarn-32k | 0.47 |
42
- | Qwen-14b-chat | 0.24 |
 
43
 
44
 
45
  Qwen-14b-chat-yarn-32k has shown significant improvement in multi-document question-answering (or retrieval) tasks and outperforms other models of similar scale.
 
11
  ---
12
  **Read this in other languages: [English](README_en.md), [中文](README.md).**
13
 
14
+ * Updated on December 28, 2023: Release Qwen-7b-chat-yarn-32k, but note that the 7b version may be significantly weaker than Qwen-14b-chat-yarn-32k due to the small model size and weak base model capabilities.
15
  * Updated on December 23, 2023: Release the evaluation results of passage_retrieval_en in LongBench
16
  * Updated on December 16, 2023: Release [Paper](https://arxiv.org/abs/2312.11193)
17
  * Updated on December 14, 2023: We have released the Qwen-14b-chat-yarn-32k model, which has been fine-tuned to handle Chinese and English question-answering tasks with a length of up to 32k (approximately 40,000 Chinese characters). This model addresses the low recall issue in multi-document question-answering tasks (also known as the "lost in middle" phenomenon) that was present in the previous 32k model obtained through position interpolation. <br>
 
22
  # Evaluation results in LongBench
23
  ### Evaluation results for passage_retrieval_zh in LongBench
24
 
25
+ | Models | Accuracy |
26
+ |------------------------------|-------------|
27
+ | **Qwen-14b-chat-yarn-32k** | **0.94** |
28
+ | gpt-3.5-turbo-16k | 0.81 |
29
+ | chatglm3-32k | 0.725 |
30
+ | Qwen-14b-chat | 0.525 |
31
+ | Qwen-14b-chat-32k-lora | 0.34 |
32
+ | **Qwen-7b-chat-yarn-32k** | **0.325** |
33
+ | Qwen-7b-chat | 0.26 |
34
+ | LongAlpaca-7b-32k-chinese-v2 | 0.12 |
35
+ | CausalLM-14b | 0.086 |
36
 
37
  ### Evaluation results for passage_retrieval_en in LongBench
38
+ | Models | Accuracy |
39
+ |----------------------------------|---------------|
40
+ | **Qwen-14b-chat-yarn-32k** | **0.945** |
41
+ | chatglm3-32k | 0.815 |
42
+ | gpt-3.5-turbo-16k | 0.88 |
43
+ | **Qwen-7b-chat-yarn-32k** | **0.47** |
44
+ | Qwen-14b-chat | 0.24 |
45
+ | Qwen-7b-chat | 0.235 |
46
 
47
 
48
  Qwen-14b-chat-yarn-32k has shown significant improvement in multi-document question-answering (or retrieval) tasks and outperforms other models of similar scale.