zhao1iang commited on
Commit
017b2ae
1 Parent(s): 26c3569

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -111,10 +111,10 @@ $$loss = \sum^{n}_{i=1} log(p_i) / n = log( \prod_{i=1}^n p_i) / n$$
111
 
112
  其中$n$是文档的长度,即token数,$p_i$是位置i上真实词的概率,我们知道文档中每一个位置上真实词的概率的联乘则为生成该文档的概率,如此我们就将loss和生成文章的概率联系在了一起。而不同模型因为使用的分词器不同,具有不同的token数,因此对损失函数乘以token数目$n$,这样就仅考虑生成文章的概率部分,不同模型也可以进行比较。我们将标准化后loss取指数转换成perplexity,使得模型的差异更加可读。为了阅读方便后续提到的loss和ppl为模型标准化后的loss和perplexity。
113
 
114
- 基于上述分析,我们对对多个领域筛选出2023年10月份新发布的几百到上千篇高质量文章,并人工进行了核对。保证所有的测试数据不在天工模型以及其他所有模型的训练集中,并且测试数据的来源也足够广泛,质量也高。我们可以选取当前最新的文章评测不同模型的ppl,模型很难作弊。
115
  下图列出了不同开源模型,天工Skywork-13B-Base取得最优效果,证明了我们的Base模型的基础能力处于国内开源模型中文最强水平。
116
 
117
- We have chosen several hundred to thousands of high-quality articles that were published in October 2023 across various fields. We have manually verified these articles to ensure their quality. It is important to note that none of the test data used in evaluating the Skywork model or any other models is included in their training set. Furthermore, the test data is diverse and of high quality, making it challenging for the models to gain an unfair advantage.
118
 
119
  The figure below displays the performance of different open source models. Skywork-13B-Base achieves the best results.
120
 
 
111
 
112
  其中$n$是文档的长度,即token数,$p_i$是位置i上真实词的概率,我们知道文档中每一个位置上真实词的概率的联乘则为生成该文档的概率,如此我们就将loss和生成文章的概率联系在了一起。而不同模型因为使用的分词器不同,具有不同的token数,因此对损失函数乘以token数目$n$,这样就仅考虑生成文章的概率部分,不同模型也可以进行比较。我们将标准化后loss取指数转换成perplexity,使得模型的差异更加可读。为了阅读方便后续提到的loss和ppl为模型标准化后的loss和perplexity。
113
 
114
+ 基于上述分析,我们对对多个领域筛选出2023年9月份新发布的几百到上千篇高质量文章,并人工进行了核对。保证所有的测试数据不在天工模型以及其他所有模型的训练集中,并且测试数据的来源也足够广泛,质量也高。我们可以选取当前最新的文章评测不同模型的ppl,模型很难作弊。
115
  下图列出了不同开源模型,天工Skywork-13B-Base取得最优效果,证明了我们的Base模型的基础能力处于国内开源模型中文最强水平。
116
 
117
+ We have chosen several hundred to thousands of high-quality articles that were published after September 1, 2023 across various fields. We have manually verified these articles to ensure their quality. It is important to note that none of the test data used in evaluating the Skywork model or any other models is included in their training set. Furthermore, the test data is diverse and of high quality, making it challenging for the models to gain an unfair advantage.
118
 
119
  The figure below displays the performance of different open source models. Skywork-13B-Base achieves the best results.
120