ChloeAuYeung commited on
Commit
30040fc
1 Parent(s): 7368735

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -18
README.md CHANGED
@@ -26,7 +26,7 @@ inference: false
26
 
27
  在预训练阶段,**XVERSE-65B** 主要使用了 7 类不同的数据类型。以下表格展示了 XVERSE-65B 与其他一些知名模型在预训练数据集方面的比较:
28
 
29
- | 数据类别 | GPT3[^1] | Llama[^2] | BLOOM[^3] | PaLM[^4] | Chinchilla[^5] | Gopher[^6] | MT-NLG[^7] | XVERSE-65B |
30
  |:-------:|:--------:|:---------:|:---------:|:--------:|:--------------:|:----------:|:----------:|:----------:|
31
  | 网页类 | Y | Y | Y | Y | Y | Y | Y | Y |
32
  | 代码类 | | Y | Y | Y | Y | Y | Y | Y |
@@ -42,14 +42,6 @@ inference: false
42
  |:-------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
43
  | 比例(%) | 72.91 | 7.09 | 4.81 | 5.62 | 6.55 | 1.15 | 1.87 |
44
 
45
- [^1]: GPT3 Paper: [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
46
- [^2]: LLaMA Paper: [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
47
- [^3]: BLOOM Paper: [BLOOM: A 176B-Parameter Open-Access Multilingual Language Model](https://arxiv.org/abs/2211.05100)
48
- [^4]: PaLM Paper: [PaLM: Scaling Language Modeling with Pathways](https://arxiv.org/abs/2204.02311)
49
- [^5]: Chinchilla Paper: [Training Compute-Optimal Large Language Models](https://arxiv.org/pdf/2203.15556)
50
- [^6]: Gopher Paper: [Scaling Language Models: Methods, Analysis & Insights from Training Gopher](https://arxiv.org/abs/2112.11446)
51
- [^7]: MT-NLG Paper: [Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model](https://arxiv.org/abs/2201.11990)
52
-
53
  ## Model Introduction
54
 
55
  **XVERSE-65B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. The models released this time is the base model **XVERSE-65B**. Its key features are as follows:
@@ -61,7 +53,7 @@ inference: false
61
 
62
  During the pre-training phase, **XVERSE-65B** primarily utilized 7 different types of data. The following table shows a comparison of the pre-training datasets of XVERSE-65B with some other well-known models:
63
 
64
- | Data Type | GPT3[^1] | Llama[^2] | BLOOM[^3] | PaLM[^4] | Chinchilla[^5] | Gopher[^6] | MT-NLG[^7] | XVERSE-65B |
65
  |:---------------:|:--------:|:---------:|:---------:|:--------:|:--------------:|:----------:|:----------:|:----------:|
66
  | Web Pages | Y | Y | Y | Y | Y | Y | Y | Y |
67
  | Code | | Y | Y | Y | Y | Y | Y | Y |
@@ -77,14 +69,6 @@ The sampling ratios of different data types during the pre-training phase are as
77
  |:--------------:|:---------:|:----:|:------------:|:-----:|:---------------:|:----:|:-----:|
78
  | Proportion (%) | 72.91 | 7.09 | 4.81 | 5.62 | 6.55 | 1.15 | 1.87 |
79
 
80
- [^1]: GPT3 Paper: [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
81
- [^2]: LLaMA Paper: [Large Language Models Are Multilingual Learners](https://arxiv.org/abs/2207.04672)
82
- [^3]: BLOOM Paper: [BLOOM: A Large Open-Access Multilingual Language Model](https://arxiv.org/abs/2211.05100)
83
- [^4]: PaLM Paper: [PaLM: Scaling Language Modeling with Pathways](https://arxiv.org/abs/2204.02311)
84
- [^5]: Chinchilla Paper: [Chinchilla: A Large Language Model that Outperforms Gopher with 70x Fewer Parameters](https://arxiv.org/abs/2207.14280)
85
- [^6]: Gopher Paper: [Introducing Gopher: A Giant Language Model from DeepMind](https://arxiv.org/abs/2112.11446)
86
- [^7]: MT-NLG Paper: [MT-NLG: The Power of Scale for Machine Translation and Natural Language Generation](https://arxiv.org/abs/2202.07536)
87
-
88
  ## 评测结果
89
 
90
  为了综合评估模型的性能,我们在一系列标准数据集上进行了全面测试,包括C-Eval、CMMLU、Gaokao-Bench、MMLU、GAOKAO-English、AGIEval、RACE-M、CommonSenseQA、PIQA、GSM8K和HumanEval。这些评估覆盖了模型在多个领域的能力,具体包括中文问答、英文问答、语言理解、常识问答、逻辑推理、数学问题解答以及编程能力。评估结果如下:
 
26
 
27
  在预训练阶段,**XVERSE-65B** 主要使用了 7 类不同的数据类型。以下表格展示了 XVERSE-65B 与其他一些知名模型在预训练数据集方面的比较:
28
 
29
+ | 数据类别 | [GPT3](https://arxiv.org/abs/2005.14165) | [Llama](https://arxiv.org/abs/2302.13971) | [BLOOM](https://arxiv.org/abs/2211.05100) | [PaLM](https://arxiv.org/abs/2204.02311) | [Chinchilla](https://arxiv.org/pdf/2203.15556) | [Gopher](https://arxiv.org/abs/2112.11446) | [MT-NLG](https://arxiv.org/abs/2201.11990) | XVERSE-65B |
30
  |:-------:|:--------:|:---------:|:---------:|:--------:|:--------------:|:----------:|:----------:|:----------:|
31
  | 网页类 | Y | Y | Y | Y | Y | Y | Y | Y |
32
  | 代码类 | | Y | Y | Y | Y | Y | Y | Y |
 
42
  |:-------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
43
  | 比例(%) | 72.91 | 7.09 | 4.81 | 5.62 | 6.55 | 1.15 | 1.87 |
44
 
 
 
 
 
 
 
 
 
45
  ## Model Introduction
46
 
47
  **XVERSE-65B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. The models released this time is the base model **XVERSE-65B**. Its key features are as follows:
 
53
 
54
  During the pre-training phase, **XVERSE-65B** primarily utilized 7 different types of data. The following table shows a comparison of the pre-training datasets of XVERSE-65B with some other well-known models:
55
 
56
+ | Data Type | [GPT3](https://arxiv.org/abs/2005.14165) | [Llama](https://arxiv.org/abs/2302.13971) | [BLOOM](https://arxiv.org/abs/2211.05100) | [PaLM](https://arxiv.org/abs/2204.02311) | [Chinchilla](https://arxiv.org/pdf/2203.15556) | [Gopher](https://arxiv.org/abs/2112.11446) | [MT-NLG](https://arxiv.org/abs/2201.11990) | XVERSE-65B |
57
  |:---------------:|:--------:|:---------:|:---------:|:--------:|:--------------:|:----------:|:----------:|:----------:|
58
  | Web Pages | Y | Y | Y | Y | Y | Y | Y | Y |
59
  | Code | | Y | Y | Y | Y | Y | Y | Y |
 
69
  |:--------------:|:---------:|:----:|:------------:|:-----:|:---------------:|:----:|:-----:|
70
  | Proportion (%) | 72.91 | 7.09 | 4.81 | 5.62 | 6.55 | 1.15 | 1.87 |
71
 
 
 
 
 
 
 
 
 
72
  ## 评测结果
73
 
74
  为了综合评估模型的性能,我们在一系列标准数据集上进行了全面测试,包括C-Eval、CMMLU、Gaokao-Bench、MMLU、GAOKAO-English、AGIEval、RACE-M、CommonSenseQA、PIQA、GSM8K和HumanEval。这些评估覆盖了模型在多个领域的能力,具体包括中文问答、英文问答、语言理解、常识问答、逻辑推理、数学问题解答以及编程能力。评估结果如下: