IDEA-CCNL
/

Ziya-Reader-13B-v1.0

@@ -21,7 +21,7 @@ pipeline_tag: question-answering
 ## 简介 Brief Introduction
-Ziya-Reader-13B-v1.0是一个知识问答模型给定问题和知识文档可以准确回答问题，用于多文档或单文档问答。该模型具有8k的上下文窗口，相比其他具有更长窗口的模型，我们在多个长文本任务的评测中胜出。包括多文档问答、合成任务（文档检索）长文本摘要。
 该模型主要面向知识库问答、检索问答、电商客服等场景，在私域知识问答中有着不错的效果，能广泛应用于法律、金融、医疗等垂直领域。因为它解决了多文档问答中当正确信息不在首个或末尾文档中时，回答准确率大幅降低的问题。
@@ -34,6 +34,8 @@ Additionally, the model also demonstrates excellent generalization capabilities,
 它基于13B的Llama2训练，在数十万通用数据和检索问答数据上进行微调得到。
 ## 评估结果 Evaluation
 Longbench Chinese
@@ -48,6 +50,18 @@ Longbench Chinese
 |Vicuna-v1.5-7B-16k|19.3|5.0|15.1|
 |Ziya-Reader-13B-v1.0| **42.8**| **66.0**|**15.3**|
 |model|LongBench 中文Multi-doc QA（%）|LongBench 中文Multi-doc QA shuffled(%) |
  |:---|:---:|:---:|
  |gpt3.5-turbo-16k | 28.7 | 23.1|
@@ -55,6 +69,10 @@ Longbench Chinese
  |Baichuan-13B-Chat2 | 32.4 | 27.2 |
  |Ziya-Reader-13B-v1.0| **42.8** | **40.9**|
 ## 模型分类 Model Taxonomy
 |  需求 Demand  | 任务 Task       | 系列 Series      | 模型 Model    | 参数 Parameter | 额外 Extra |
@@ -64,7 +82,8 @@ Longbench Chinese
 ## 模型信息 Model Information
 我们使用了位置插值（PI）的方式，在精选的长文档语料上进行微调，扩展上下文到8k大小。其次，模型靠数据喂养，我们从近千万数据中筛选高质量数据，仅用层层过滤的10万量级的数据即可将一个平平无奇的模型培养成知识问答小钢炮。另外，我们为搜索任务量身定做了特殊的任务，精心制作了数据，让模型学会从中寻找相关文档并回答问题。
 ## Usage
 ### 环境

 ## 简介 Brief Introduction
+Ziya-Reader-13B-v1.0是一个知识问答模型，给定问题和知识文档可以准确回答问题，用于多文档或单文档问答。该模型具有8k的上下文窗口，相比其他具有更长窗口的模型，我们在多个长文本任务的评测中胜出。包括多文档问答、合成任务（文档检索）长文本摘要。
 该模型主要面向知识库问答、检索问答、电商客服等场景，在私域知识问答中有着不错的效果，能广泛应用于法律、金融、医疗等垂直领域。因为它解决了多文档问答中当正确信息不在首个或末尾文档中时，回答准确率大幅降低的问题。
 它基于13B的Llama2训练，在数十万通用数据和检索问答数据上进行微调得到。
 ## 评估结果 Evaluation
 Longbench Chinese
 |Vicuna-v1.5-7B-16k|19.3|5.0|15.1|
 |Ziya-Reader-13B-v1.0| **42.8**| **66.0**|**15.3**|
+Multi-doc QA是多文档问答任务，给定问题和多个文档，根据其中含有正确信息的文档回答问题。该任务衡量模型的相关性判断和记忆力，以及问答的能力。在该任务上Ziya-Reader-13B-v1.0大幅领先所有模型，包括更长窗口的模型。
+Synthetic task是合成的相关文档查找任务，给定一个摘要，从众多文档中找出与它对应文档。该任务衡量模型的语义匹配能力。在该任务上，我们的模型超越了所有开源模型，达到66%。
+Summarization是长文本摘要任务，给定包含多个说话人的会议记录，生成出超长上下文的会议总结。在该任务上我们的模型非常有竞争力，在只有8k的上下文窗口情况下，与16k或更长窗口的模型差距不到1%，在8k窗口中最强。
+"Multi-doc QA" is a multi-document question-answering task, where given a question and multiple documents, the model answers the question based on the documents that contain relevant information. This task measures the model's ability in relevance judgment, memory, and question-answering skills.
+"Synthetic task" is a synthetic document retrieval task, where given a summary, the goal is to find the corresponding document from a large number of documents. This task evaluates the model's semantic matching ability.
+"Summarization" is a long-text summarization task, where given meeting records containing multiple speakers, the model generates a meeting summary with an extremely long context.
 |model|LongBench 中文Multi-doc QA（%）|LongBench 中文Multi-doc QA shuffled(%) |
  |:---|:---:|:---:|
  |gpt3.5-turbo-16k | 28.7 | 23.1|
  |Baichuan-13B-Chat2 | 32.4 | 27.2 |
  |Ziya-Reader-13B-v1.0| **42.8** | **40.9**|
+我们发现Multi-doc QA中的文档都按照相关性从高到低排列，正确答案往往在第一或前几个，并不能反映模型的相关性判断能力。因此我们对该测试集打乱文档的顺序，再测试各个模型的效果。结果发现目前大多数模型的效果均显著下降，从5%到17%不等，而我们的模型非常鲁棒，降幅不到2%。
+We found that the documents in Multi-doc QA were arranged in descending order of relevance, with the correct answer often in the first or early positions, which did not truly reflect the model's ability in relevance judgment. Therefore, we shuffled the document order in this test set and evaluated the performance of various models. The results showed a significant decrease in performance for most models, ranging from 5% to 17%. In contrast, our model demonstrated high robustness with a decrease of less than 2%.
 ## 模型分类 Model Taxonomy
 |  需求 Demand  | 任务 Task       | 系列 Series      | 模型 Model    | 参数 Parameter | 额外 Extra |
 ## 模型信息 Model Information
 我们使用了位置插值（PI）的方式，在精选的长文档语料上进行微调，扩展上下文到8k大小。其次，模型靠数据喂养，我们从近千万数据中筛选高质量数据，仅用层层过滤的10万量级的数据即可将一个平平无奇的模型培养成知识问答小钢炮。另外，我们为搜索任务量身定做了特殊的任务，精心制作了数据，让模型学会从中寻找相关文档并回答问题。
+更多信息请阅读我们的公众号文章[姜子牙大模型系列 | 为知识检索而生，Ziya-Reader开源，多个长文本中文任务第一](https://mp.weixin.qq.com/s/ucrvoTKBgQZZJxbr2NFP6g)
+Please read our public release article for more details[姜子牙大模型系列 | 为知识检索而生，Ziya-Reader开源，多个长文本中文任务第一](https://mp.weixin.qq.com/s/ucrvoTKBgQZZJxbr2NFP6g)
 ## Usage
 ### 环境