ChloeAuYeung
commited on
Commit
•
30040fc
1
Parent(s):
7368735
Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ inference: false
|
|
26 |
|
27 |
在预训练阶段,**XVERSE-65B** 主要使用了 7 类不同的数据类型。以下表格展示了 XVERSE-65B 与其他一些知名模型在预训练数据集方面的比较:
|
28 |
|
29 |
-
| 数据类别 | GPT3
|
30 |
|:-------:|:--------:|:---------:|:---------:|:--------:|:--------------:|:----------:|:----------:|:----------:|
|
31 |
| 网页类 | Y | Y | Y | Y | Y | Y | Y | Y |
|
32 |
| 代码类 | | Y | Y | Y | Y | Y | Y | Y |
|
@@ -42,14 +42,6 @@ inference: false
|
|
42 |
|:-------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
|
43 |
| 比例(%) | 72.91 | 7.09 | 4.81 | 5.62 | 6.55 | 1.15 | 1.87 |
|
44 |
|
45 |
-
[^1]: GPT3 Paper: [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
|
46 |
-
[^2]: LLaMA Paper: [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
|
47 |
-
[^3]: BLOOM Paper: [BLOOM: A 176B-Parameter Open-Access Multilingual Language Model](https://arxiv.org/abs/2211.05100)
|
48 |
-
[^4]: PaLM Paper: [PaLM: Scaling Language Modeling with Pathways](https://arxiv.org/abs/2204.02311)
|
49 |
-
[^5]: Chinchilla Paper: [Training Compute-Optimal Large Language Models](https://arxiv.org/pdf/2203.15556)
|
50 |
-
[^6]: Gopher Paper: [Scaling Language Models: Methods, Analysis & Insights from Training Gopher](https://arxiv.org/abs/2112.11446)
|
51 |
-
[^7]: MT-NLG Paper: [Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model](https://arxiv.org/abs/2201.11990)
|
52 |
-
|
53 |
## Model Introduction
|
54 |
|
55 |
**XVERSE-65B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. The models released this time is the base model **XVERSE-65B**. Its key features are as follows:
|
@@ -61,7 +53,7 @@ inference: false
|
|
61 |
|
62 |
During the pre-training phase, **XVERSE-65B** primarily utilized 7 different types of data. The following table shows a comparison of the pre-training datasets of XVERSE-65B with some other well-known models:
|
63 |
|
64 |
-
| Data Type
|
65 |
|:---------------:|:--------:|:---------:|:---------:|:--------:|:--------------:|:----------:|:----------:|:----------:|
|
66 |
| Web Pages | Y | Y | Y | Y | Y | Y | Y | Y |
|
67 |
| Code | | Y | Y | Y | Y | Y | Y | Y |
|
@@ -77,14 +69,6 @@ The sampling ratios of different data types during the pre-training phase are as
|
|
77 |
|:--------------:|:---------:|:----:|:------------:|:-----:|:---------------:|:----:|:-----:|
|
78 |
| Proportion (%) | 72.91 | 7.09 | 4.81 | 5.62 | 6.55 | 1.15 | 1.87 |
|
79 |
|
80 |
-
[^1]: GPT3 Paper: [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
|
81 |
-
[^2]: LLaMA Paper: [Large Language Models Are Multilingual Learners](https://arxiv.org/abs/2207.04672)
|
82 |
-
[^3]: BLOOM Paper: [BLOOM: A Large Open-Access Multilingual Language Model](https://arxiv.org/abs/2211.05100)
|
83 |
-
[^4]: PaLM Paper: [PaLM: Scaling Language Modeling with Pathways](https://arxiv.org/abs/2204.02311)
|
84 |
-
[^5]: Chinchilla Paper: [Chinchilla: A Large Language Model that Outperforms Gopher with 70x Fewer Parameters](https://arxiv.org/abs/2207.14280)
|
85 |
-
[^6]: Gopher Paper: [Introducing Gopher: A Giant Language Model from DeepMind](https://arxiv.org/abs/2112.11446)
|
86 |
-
[^7]: MT-NLG Paper: [MT-NLG: The Power of Scale for Machine Translation and Natural Language Generation](https://arxiv.org/abs/2202.07536)
|
87 |
-
|
88 |
## 评测结果
|
89 |
|
90 |
为了综合评估模型的性能,我们在一系列标准数据集上进行了全面测试,包括C-Eval、CMMLU、Gaokao-Bench、MMLU、GAOKAO-English、AGIEval、RACE-M、CommonSenseQA、PIQA、GSM8K和HumanEval。这些评估覆盖了模型在多个领域的能力,具体包括中文问答、英文问答、语言理解、常识问答、逻辑推理、数学问题解答以及编程能力。评估结果如下:
|
|
|
26 |
|
27 |
在预训练阶段,**XVERSE-65B** 主要使用了 7 类不同的数据类型。以下表格展示了 XVERSE-65B 与其他一些知名模型在预训练数据集方面的比较:
|
28 |
|
29 |
+
| 数据类别 | [GPT3](https://arxiv.org/abs/2005.14165) | [Llama](https://arxiv.org/abs/2302.13971) | [BLOOM](https://arxiv.org/abs/2211.05100) | [PaLM](https://arxiv.org/abs/2204.02311) | [Chinchilla](https://arxiv.org/pdf/2203.15556) | [Gopher](https://arxiv.org/abs/2112.11446) | [MT-NLG](https://arxiv.org/abs/2201.11990) | XVERSE-65B |
|
30 |
|:-------:|:--------:|:---------:|:---------:|:--------:|:--------------:|:----------:|:----------:|:----------:|
|
31 |
| 网页类 | Y | Y | Y | Y | Y | Y | Y | Y |
|
32 |
| 代码类 | | Y | Y | Y | Y | Y | Y | Y |
|
|
|
42 |
|:-------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
|
43 |
| 比例(%) | 72.91 | 7.09 | 4.81 | 5.62 | 6.55 | 1.15 | 1.87 |
|
44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
## Model Introduction
|
46 |
|
47 |
**XVERSE-65B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. The models released this time is the base model **XVERSE-65B**. Its key features are as follows:
|
|
|
53 |
|
54 |
During the pre-training phase, **XVERSE-65B** primarily utilized 7 different types of data. The following table shows a comparison of the pre-training datasets of XVERSE-65B with some other well-known models:
|
55 |
|
56 |
+
| Data Type | [GPT3](https://arxiv.org/abs/2005.14165) | [Llama](https://arxiv.org/abs/2302.13971) | [BLOOM](https://arxiv.org/abs/2211.05100) | [PaLM](https://arxiv.org/abs/2204.02311) | [Chinchilla](https://arxiv.org/pdf/2203.15556) | [Gopher](https://arxiv.org/abs/2112.11446) | [MT-NLG](https://arxiv.org/abs/2201.11990) | XVERSE-65B |
|
57 |
|:---------------:|:--------:|:---------:|:---------:|:--------:|:--------------:|:----------:|:----------:|:----------:|
|
58 |
| Web Pages | Y | Y | Y | Y | Y | Y | Y | Y |
|
59 |
| Code | | Y | Y | Y | Y | Y | Y | Y |
|
|
|
69 |
|:--------------:|:---------:|:----:|:------------:|:-----:|:---------------:|:----:|:-----:|
|
70 |
| Proportion (%) | 72.91 | 7.09 | 4.81 | 5.62 | 6.55 | 1.15 | 1.87 |
|
71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
## 评测结果
|
73 |
|
74 |
为了综合评估模型的性能,我们在一系列标准数据集上进行了全面测试,包括C-Eval、CMMLU、Gaokao-Bench、MMLU、GAOKAO-English、AGIEval、RACE-M、CommonSenseQA、PIQA、GSM8K和HumanEval。这些评估覆盖了模型在多个领域的能力,具体包括中文问答、英文问答、语言理解、常识问答、逻辑推理、数学问题解答以及编程能力。评估结果如下:
|