xverse
/

XVERSE-65B

Text Generation

Transformers

PyTorch

xverse

custom_code

Model card Files Files and versions Community

ChloeAuYeung commited on Nov 24, 2023

Commit

30040fc

•

1 Parent(s): 7368735

Update README.md

Browse files

Files changed (1) hide show

README.md +2 -18

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ inference: false
 在预训练阶段，**XVERSE-65B** 主要使用了 7 类不同的数据类型。以下表格展示了 XVERSE-65B 与其他一些知名模型在预训练数据集方面的比较：
-| 数据类别 | GPT3[^1] | Llama[^2] | BLOOM[^3] | PaLM[^4] | Chinchilla[^5] | Gopher[^6] | MT-NLG[^7] | XVERSE-65B |
 |:-------:|:--------:|:---------:|:---------:|:--------:|:--------------:|:----------:|:----------:|:----------:|
 | 网页类   | Y        | Y         | Y         | Y        | Y              | Y          | Y          | Y          |
 | 代码类   |          | Y         | Y         | Y        | Y              | Y          | Y          | Y          |
@@ -42,14 +42,6 @@ inference: false
 |:-------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
 | 比例(%) | 72.91  | 7.09   | 4.81   | 5.62   | 6.55   | 1.15   | 1.87   |
-[^1]: GPT3 Paper: [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
-[^2]: LLaMA Paper: [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
-[^3]: BLOOM Paper: [BLOOM: A 176B-Parameter Open-Access Multilingual Language Model](https://arxiv.org/abs/2211.05100)
-[^4]: PaLM Paper: [PaLM: Scaling Language Modeling with Pathways](https://arxiv.org/abs/2204.02311)
-[^5]: Chinchilla Paper: [Training Compute-Optimal Large Language Models](https://arxiv.org/pdf/2203.15556)
-[^6]: Gopher Paper: [Scaling Language Models: Methods, Analysis & Insights from Training Gopher](https://arxiv.org/abs/2112.11446)
-[^7]: MT-NLG Paper: [Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model](https://arxiv.org/abs/2201.11990)
 ## Model Introduction
 **XVERSE-65B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. The models released this time is the base model **XVERSE-65B**. Its key features are as follows:
@@ -61,7 +53,7 @@ inference: false
 During the pre-training phase, **XVERSE-65B** primarily utilized 7 different types of data. The following table shows a comparison of the pre-training datasets of XVERSE-65B with some other well-known models:
-| Data Type       | GPT3[^1] | Llama[^2] | BLOOM[^3] | PaLM[^4] | Chinchilla[^5] | Gopher[^6] | MT-NLG[^7] | XVERSE-65B |
 |:---------------:|:--------:|:---------:|:---------:|:--------:|:--------------:|:----------:|:----------:|:----------:|
 | Web Pages       | Y        | Y         | Y         | Y        | Y              | Y          | Y          | Y          |
 | Code            |          | Y         | Y         | Y        | Y              | Y          | Y          | Y          |
@@ -77,14 +69,6 @@ The sampling ratios of different data types during the pre-training phase are as
 |:--------------:|:---------:|:----:|:------------:|:-----:|:---------------:|:----:|:-----:|
 | Proportion (%) |   72.91   | 7.09 |     4.81     |  5.62 |       6.55      | 1.15 |  1.87 |
-[^1]: GPT3 Paper: [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
-[^2]: LLaMA Paper: [Large Language Models Are Multilingual Learners](https://arxiv.org/abs/2207.04672)
-[^3]: BLOOM Paper: [BLOOM: A Large Open-Access Multilingual Language Model](https://arxiv.org/abs/2211.05100)
-[^4]: PaLM Paper: [PaLM: Scaling Language Modeling with Pathways](https://arxiv.org/abs/2204.02311)
-[^5]: Chinchilla Paper: [Chinchilla: A Large Language Model that Outperforms Gopher with 70x Fewer Parameters](https://arxiv.org/abs/2207.14280)
-[^6]: Gopher Paper: [Introducing Gopher: A Giant Language Model from DeepMind](https://arxiv.org/abs/2112.11446)
-[^7]: MT-NLG Paper: [MT-NLG: The Power of Scale for Machine Translation and Natural Language Generation](https://arxiv.org/abs/2202.07536)
 ## 评测结果
 为了综合评估模型的性能，我们在一系列标准数据集上进行了全面测试，包括C-Eval、CMMLU、Gaokao-Bench、MMLU、GAOKAO-English、AGIEval、RACE-M、CommonSenseQA、PIQA、GSM8K和HumanEval。这些评估覆盖了模型在多个领域的能力，具体包括中文问答、英文问答、语言理解、常识问答、逻辑推理、数学问题解答以及编程能力。评估结果如下：

 在预训练阶段，**XVERSE-65B** 主要使用了 7 类不同的数据类型。以下表格展示了 XVERSE-65B 与其他一些知名模型在预训练数据集方面的比较：
+| 数据类别 | [GPT3](https://arxiv.org/abs/2005.14165) | [Llama](https://arxiv.org/abs/2302.13971) | [BLOOM](https://arxiv.org/abs/2211.05100) | [PaLM](https://arxiv.org/abs/2204.02311) | [Chinchilla](https://arxiv.org/pdf/2203.15556) | [Gopher](https://arxiv.org/abs/2112.11446) | [MT-NLG](https://arxiv.org/abs/2201.11990) | XVERSE-65B |
 |:-------:|:--------:|:---------:|:---------:|:--------:|:--------------:|:----------:|:----------:|:----------:|
 | 网页类   | Y        | Y         | Y         | Y        | Y              | Y          | Y          | Y          |
 | 代码类   |          | Y         | Y         | Y        | Y              | Y          | Y          | Y          |
 |:-------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
 | 比例(%) | 72.91  | 7.09   | 4.81   | 5.62   | 6.55   | 1.15   | 1.87   |
 ## Model Introduction
 **XVERSE-65B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. The models released this time is the base model **XVERSE-65B**. Its key features are as follows:
 During the pre-training phase, **XVERSE-65B** primarily utilized 7 different types of data. The following table shows a comparison of the pre-training datasets of XVERSE-65B with some other well-known models:
+| Data Type | [GPT3](https://arxiv.org/abs/2005.14165) | [Llama](https://arxiv.org/abs/2302.13971) | [BLOOM](https://arxiv.org/abs/2211.05100) | [PaLM](https://arxiv.org/abs/2204.02311) | [Chinchilla](https://arxiv.org/pdf/2203.15556) | [Gopher](https://arxiv.org/abs/2112.11446) | [MT-NLG](https://arxiv.org/abs/2201.11990) | XVERSE-65B |
 |:---------------:|:--------:|:---------:|:---------:|:--------:|:--------------:|:----------:|:----------:|:----------:|
 | Web Pages       | Y        | Y         | Y         | Y        | Y              | Y          | Y          | Y          |
 | Code            |          | Y         | Y         | Y        | Y              | Y          | Y          | Y          |
 |:--------------:|:---------:|:----:|:------------:|:-----:|:---------------:|:----:|:-----:|
 | Proportion (%) |   72.91   | 7.09 |     4.81     |  5.62 |       6.55      | 1.15 |  1.87 |
 ## 评测结果
 为了综合评估模型的性能，我们在一系列标准数据集上进行了全面测试，包括C-Eval、CMMLU、Gaokao-Bench、MMLU、GAOKAO-English、AGIEval、RACE-M、CommonSenseQA、PIQA、GSM8K和HumanEval。这些评估覆盖了模型在多个领域的能力，具体包括中文问答、英文问答、语言理解、常识问答、逻辑推理、数学问题解答以及编程能力。评估结果如下：