Text Generation
Transformers
Safetensors
Chinese
English
qwen
conversational
custom_code
yuyijiong commited on
Commit
91ec89b
1 Parent(s): 2ab8157

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +14 -2
  2. README_en.md +15 -1
README.md CHANGED
@@ -11,6 +11,7 @@ pipeline_tag: text-generation
11
  ---
12
  **Read this in other languages: [English](README_en.md), [中文](README.md).**
13
 
 
14
  * 2023.12.28更新:发布Qwen-7b-chat-yarn-32k,但注意,可能由于模型规模偏小,基座模型能力弱,导致7b版本显著弱于Qwen-14b-chat-yarn-32k
15
  * 2023.12.23更新:发布LongBench的passage_retrieval_en的评测结果
16
  * 2023.12.16更新:发布[论文(中文版)](https://cloud.tsinghua.edu.cn/d/5894ec4442e54a6aac96/)、[论文(英文版)](https://arxiv.org/abs/2312.11193)
@@ -19,6 +20,11 @@ pipeline_tag: text-generation
19
 
20
  # 支持32k上下文的的Qwen-14b-chat模型
21
 
 
 
 
 
 
22
  <br>
23
 
24
  # LongBench测试结果
@@ -48,11 +54,17 @@ pipeline_tag: text-generation
48
 
49
  Qwen-14b-chat-yarn-32k经过微调后,在多文档问答(或检索)任务上提升非常显著,大幅领先其他同规模的模型。
50
 
 
 
 
 
 
 
51
  <br>
52
 
53
  # Usage
54
  * 使用此模型时会自动设置 ```config.use_logn_attn=False```、```config.use_dynamic_ntk=True```,会产生warning,不影响模型使用。
55
- * 长文本类型的任务,尽量将长参考文本放在前面,用户的问题放在后面。
56
  * 请务必安装```flash-attention2```,否则长文本下推理速度极慢,而且可能会报错。
57
  ```python
58
  from transformers import AutoModelForCausalLM, AutoTokenizer,AutoConfig
@@ -77,7 +89,7 @@ print(response)
77
  ### 3.指令微调
78
  * 使用[yuyijiong/Long-Instruction-Chinese](https://huggingface.co/datasets/yuyijiong/Long-Instruction-Chinese)数据,Qlora方法,对Qwen模型进行微调。
79
 
80
- * 更多训练细节在论文中介绍。
81
 
82
  <br>
83
 
 
11
  ---
12
  **Read this in other languages: [English](README_en.md), [中文](README.md).**
13
 
14
+ * 2023.12.30更新:“大海捞针”测试结果
15
  * 2023.12.28更新:发布Qwen-7b-chat-yarn-32k,但注意,可能由于模型规模偏小,基座模型能力弱,导致7b版本显著弱于Qwen-14b-chat-yarn-32k
16
  * 2023.12.23更新:发布LongBench的passage_retrieval_en的评测结果
17
  * 2023.12.16更新:发布[论文(中文版)](https://cloud.tsinghua.edu.cn/d/5894ec4442e54a6aac96/)、[论文(英文版)](https://arxiv.org/abs/2312.11193)
 
20
 
21
  # 支持32k上下文的的Qwen-14b-chat模型
22
 
23
+ ## 模型的主要特性:
24
+ * 基于Qwen-14b-chat,使用“原文复述”任务进行指令微调
25
+ * 使用Yarn插值方法,使模型能适应32k甚至更长的文本
26
+ * 推理时,无需特定prompt,即可给出高准确率的回答。
27
+
28
  <br>
29
 
30
  # LongBench测试结果
 
54
 
55
  Qwen-14b-chat-yarn-32k经过微调后,在多文档问答(或检索)任务上提升非常显著,大幅领先其他同规模的模型。
56
 
57
+
58
+ # "大海捞针"测试结果
59
+ ![](大海捞针50k.png)
60
+ * 可以发现即使在50k长度下(即使训练样本不大于32k),检索信息的准确率依然极高,证明此模型确实拥有强大的长上下文能力,极大缓解“lost in the middle”问题,并且拥有极大的扩展潜力。
61
+ * 而且此模型在推理时,不需要进行"原文复述",只需要给出问题,并让模型直接回答问题,模型就能给出正确的答案。(相对的,claude2.1-200k 需要特定的 prompt才能正确回答)这也证明了此模型的强大能力。
62
+
63
  <br>
64
 
65
  # Usage
66
  * 使用此模型时会自动设置 ```config.use_logn_attn=False```、```config.use_dynamic_ntk=True```,会产生warning,不影响模型使用。
67
+ * 长文本类型的任务,尽量将长参考文本放在前面,用户的问题放在后面。同时,在问题前面最好加上 **“问题:”** 或 **“Question: ”** 等提示(可参考下面的多文档问答示例),以便模型更好的区分参考文本和用户问题。
68
  * 请务必安装```flash-attention2```,否则长文本下推理速度极慢,而且可能会报错。
69
  ```python
70
  from transformers import AutoModelForCausalLM, AutoTokenizer,AutoConfig
 
89
  ### 3.指令微调
90
  * 使用[yuyijiong/Long-Instruction-Chinese](https://huggingface.co/datasets/yuyijiong/Long-Instruction-Chinese)数据,Qlora方法,对Qwen模型进行微调。
91
 
92
+ * 更多细节在论文中介绍。
93
 
94
  <br>
95
 
README_en.md CHANGED
@@ -11,6 +11,7 @@ pipeline_tag: text-generation
11
  ---
12
  **Read this in other languages: [English](README_en.md), [中文](README.md).**
13
 
 
14
  * Updated on December 28, 2023: Release Qwen-7b-chat-yarn-32k, but note that the 7b version may be significantly weaker than Qwen-14b-chat-yarn-32k due to the small model size and weak base model capabilities.
15
  * Updated on December 23, 2023: Release the evaluation results of passage_retrieval_en in LongBench
16
  * Updated on December 16, 2023: Release [Paper](https://arxiv.org/abs/2312.11193)
@@ -19,6 +20,11 @@ pipeline_tag: text-generation
19
 
20
  # Qwen-14b-chat model with 32k context window
21
 
 
 
 
 
 
22
  # Evaluation results in LongBench
23
  ### Evaluation results for passage_retrieval_zh in LongBench
24
 
@@ -46,11 +52,19 @@ pipeline_tag: text-generation
46
 
47
 
48
  Qwen-14b-chat-yarn-32k has shown significant improvement in multi-document question-answering (or retrieval) tasks and outperforms other models of similar scale.
 
 
 
 
 
 
 
 
49
  <br>
50
 
51
  # Usage
52
  * When using this model, it will automatically set ```config.use_logn_attn=False``` and ```config.use_dynamic_ntk=True```, resulting in a warning message. Don't mind, this does not affect the model's performance.
53
- * For tasks involving long texts, it is recommended to place the long reference text before the user's question.
54
  * Please make sure to install ```flash attention 2```, otherwise the inference speed under long text will be extremely slow and errors may occur.
55
  ```python
56
  from transformers import AutoModelForCausalLM, AutoTokenizer,AutoConfig
 
11
  ---
12
  **Read this in other languages: [English](README_en.md), [中文](README.md).**
13
 
14
+ * Updated on December 30, 2023: "Needle in a Haystack" test results
15
  * Updated on December 28, 2023: Release Qwen-7b-chat-yarn-32k, but note that the 7b version may be significantly weaker than Qwen-14b-chat-yarn-32k due to the small model size and weak base model capabilities.
16
  * Updated on December 23, 2023: Release the evaluation results of passage_retrieval_en in LongBench
17
  * Updated on December 16, 2023: Release [Paper](https://arxiv.org/abs/2312.11193)
 
20
 
21
  # Qwen-14b-chat model with 32k context window
22
 
23
+ ## Model Main Features:
24
+ * Based on Qwen-14b-chat, fine-tuned with the "original text paraphrasing" task
25
+ * Using Yarn interpolation method, the model can adapt to 32k or even longer text
26
+ * During inference, the model can give high-accuracy answers without specially designed prompts
27
+
28
  # Evaluation results in LongBench
29
  ### Evaluation results for passage_retrieval_zh in LongBench
30
 
 
52
 
53
 
54
  Qwen-14b-chat-yarn-32k has shown significant improvement in multi-document question-answering (or retrieval) tasks and outperforms other models of similar scale.
55
+
56
+
57
+ # Test Results for "Needle in a Haystack"
58
+ ![](大海捞针50k.png)
59
+ * The model can accurately retrieve the needle in a haystack even when the context length is 50k or longer, proving that the model does have strong long-context capabilities, which greatly alleviates the "lost in the middle" problem.
60
+ * In addition, the model does not need to paraphrase the original text during inference, it only needs to give the question and let the model answer the question directly, and the model can give the correct answer. (In contrast, claude2.1-200k needs a specific prompt to answer correctly) This also proves the powerful ability of this model.
61
+
62
+
63
  <br>
64
 
65
  # Usage
66
  * When using this model, it will automatically set ```config.use_logn_attn=False``` and ```config.use_dynamic_ntk=True```, resulting in a warning message. Don't mind, this does not affect the model's performance.
67
+ * For tasks involving long texts, it is recommended to place the long reference text before the user's question. It is best to add a prefix such as **"Question: "** before the user's question, so that the model can better distinguish between the reference text and user's question.
68
  * Please make sure to install ```flash attention 2```, otherwise the inference speed under long text will be extremely slow and errors may occur.
69
  ```python
70
  from transformers import AutoModelForCausalLM, AutoTokenizer,AutoConfig