baichuan-inc
/

Baichuan-13B-Chat

@@ -12,12 +12,12 @@ inference: false
 ## 介绍
 Baichuan-13B-Chat为Baichuan-13B系列模型中对齐后的版本，预训练模型可见[Baichuan-13B-Base](https://github.com/baichuan-inc/Baichuan-13B-Base)。
-[Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) 是由百川智能继 [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 之后开发的包含 130 亿参数的开源可商用的大规模语言模型，在标准的中文和英文 benchmark上均取得同尺寸最好的效果。本次发布包含有预训练 (Baichuan-13B-Base) 和对齐 (Baichuan-13B-Chat) 两个版本。Baichuan-13B 有如下几个特点：
-  1. **更大尺寸、更多数据**：Baichuan-13B在[Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 的基础上进一步扩大参数量到130亿，并且在高质量的语料上训练了1.4万亿tokens，超过LLaMA-13B 40%，是当前开源13B尺寸下训练数据量最多的模型。支持中英双语，使用ALiBi 位置编码，上下文窗口长度为 4096。
   2. **同时开源预训练和对齐模型**：预训练模型是适用开发者的”基座“，而广大普通用户对有对话功能的对齐模型具有更强的需求。因此本次开源我们同时发布了对齐模型（Baichuan-13B-Chat），具有很强的对话能力，开箱即用，几行代码即可简单的部署。
-  3. **更高效的推理**：为了支持更广大用户的使用，我们本次同时开源了INT8和INT4的量化版本，在几乎没有效果损失的情况下可以很方便的将模型部署在如3090等消费机显卡上。
-  4. **开源免费可商用**：Baichuan-13B不仅对学术研究完全开放，开发者也仅需邮件申请并获得官方商用许可后，即可以免费商用。
 Baichuan-13B-Chat is the aligned version in the Baichuan-13B series of models, and the pre-trained model can be found at [Baichuan-13B-Base](https://github.com/baichuan-inc/Baichuan-13B-Base).
@@ -132,54 +132,50 @@ For specific training settings, please refer to [Baichuan-13B](https://github.co
 ## 测评结果
-我们在各个 benchmark 下进行了`5-shot`评测，所采用的方法和 [Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B/) 项目中相同。结果如下：
-We conducted a `5-shot` evaluation under various benchmarks, using the same method as in the [Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B/) project. The results are as follows:
-### C-Eval
 | Model 5-shot            | STEM  | Social Sciences | Humanities | Others | Average |
-|-------------------------|-------|-----------------|------------|--------|---------|
-| ChatGLM2-6B             | 45.9  | 61.6            | 49.7       | 48.2   | 50.2    |
-| InternLM-7B<sup>*</sup> | 40.1  | 55.7            | 49.4       | 37.9   | 44.6    |
 | Baichuan-7B             | 38.2  | 52.0            | 46.2       | 39.3   | 42.8    |
 | Ziya-LLaMA-13B-Pretrain | 27.6  | 34.4            | 32.0       | 28.6   | 30.0    |
 | LLaMA-13B               | 27.0  | 33.6            | 27.7       | 27.6   | 28.5    |
 | moss-moon-003-base (16B)| 27.0  | 29.1            | 27.2       | 26.9   | 27.4    |
 | vicuna-13B              | 22.8  | 24.8            | 22.3       | 18.5   | 22.2    |
 | **Baichuan-13B-Base**   | **45.9** | **63.5** | **57.2**    | **49.3** | **52.4** |
 | **Baichuan-13B-Chat**   | **43.7** | **64.6** | **56.2**    | **49.2** | **51.5** |
-> *说明：表中各个模型的结果是使用统一的评估代码得到。[InternLM-7B](https://huggingface.co/internlm/internlm-7b) 汇报使用 [OpenCompass](https://opencompass.org.cn/rank) 工具评估的C-Eval平均值为 53.4，我们使用 OpenCompass 评估 InternLM-7B 的平均值为 51.6
-### MMLU
 | Model 5-shot            | STEM  | Social Sciences | Humanities | Others | Average |
-|-------------------------|-------|-----------------|------------|--------|---------|
 | LLaMA-13B               | 36.1  | 53.0            | 44.0       | 52.8   | 46.3    |
-| ChatGLM2-6B             | 38.2  | 52.5            | 43.2       | 50.8   | 45.9    |
-| InternLM-7B             | 38.0  | 51.1            | 39.2       | 50.2   | 44.1    |
 | Ziya-LLaMA-13B-Pretrain | 35.6  | 47.6            | 40.1       | 49.4   | 42.9    |
 | Baichuan-7B             | 35.6  | 48.9            | 38.4       | 48.1   | 42.3    |
 | vicuna-13B              | 24.2  | 24.1            | 24.6       | 26.8   | 24.9    |
 | moss-moon-003-base (16B)| 22.4  | 22.8            | 24.2       | 24.4   | 23.6    |
 | **Baichuan-13B-Base**   | **41.6** | **60.9** | **47.4**    | **58.5** | **51.6** |
 | **Baichuan-13B-Chat**   | **40.9** | **60.9** | **48.8**    | **59.0** | **52.1** |
-### CMMLU
 | Model 5-shot            | STEM  | Humanities | Social Sciences | Others | China Specific | Average |
-|-------------------------|-------|------------|-----------------|--------|----------------|---------|
-| InternLM-7B             | 41.7  | 54.4       | 56.4            | 55.4   | 53.1           | 52.1    |
-| ChatGLM2-6B             | 42.5  | 51.4       | 51.4            | 50.7   | 48.4           | 49.0    |
 | Baichuan-7B             | 34.4  | 47.5       | 47.6            | 46.6   | 44.3           | 44.0    |
 | Ziya-LLaMA-13B-Pretrain | 29.0  | 30.7       | 33.8            | 34.4   | 31.9           | 32.1    |
 | LLaMA-13B               | 29.2  | 30.8       | 31.6            | 33.0   | 30.5           | 31.2    |
 | moss-moon-003-base (16B)| 27.2  | 30.4       | 28.8            | 32.6   | 28.7           | 29.6    |
 | vicuna-13B              | 24.0  | 25.4       | 25.3            | 25.0   | 25.0           | 24.9    |
 | **Baichuan-13B-Base**   | **41.7** | **61.1** | **59.8** | **59.0**          | **56.4** | **55.3** |
 | **Baichuan-13B-Chat**   | **42.8** | **62.6** | **59.7** | **59.0**          | **56.1** | **55.8** |
-> 说明：CMMLU是一个综合性的中文评估基准，专门用于评估语言模型在中文语境下的知识和推理能力。我们采用了其官方的[评测方案](https://github.com/haonan-li/CMMLU)。
 ## 微信群组
 ![WeChat](https://github.com/baichuan-inc/Baichuan-13B/blob/main/media/wechat.jpeg?raw=true)

 ## 介绍
 Baichuan-13B-Chat为Baichuan-13B系列模型中对齐后的版本，预训练模型可见[Baichuan-13B-Base](https://github.com/baichuan-inc/Baichuan-13B-Base)。
+[Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) 是由百川智能继 [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 之后开发的包含 130 亿参数的开源可商用的大规模语言模型，在权威的中文和英文 benchmark 上均取得同尺寸最好的效果。本次发布包含有预训练 ([Baichuan-13B-Base](https://huggingface.co/baichuan-inc/Baichuan-13B-Base)) 和对齐 ([Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat)) 两个版本。Baichuan-13B 有如下几个特点：
+  1. **更大尺寸、更多数据**：Baichuan-13B 在 [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 的基础上进一步扩大参数量到 130 亿，并且在高质量的语料上训练了 1.4 万亿 tokens，超过 LLaMA-13B 40%，是当前开源 13B 尺寸下训练数据量最多的模型。支持中英双语，使用 ALiBi 位置编码，上下文窗口长度为 4096。
   2. **同时开源预训练和对齐模型**：预训练模型是适用开发者的”基座“，而广大普通用户对有对话功能的对齐模型具有更强的需求。因此本次开源我们同时发布了对齐模型（Baichuan-13B-Chat），具有很强的对话能力，开箱即用，几行代码即可简单的部署。
+  3. **更高效的推理**：为了支持更广大用户的使用，我们本次同时开源了 int8 和 int4 的量化版本，相对非量化版本在几乎没有效果损失的情况下大大降低了部署的机器资源门槛，可以部署在如 Nvidia 3090 这样的消费级显卡上。
+  4. **开源免费可商用**：Baichuan-13B 不仅对学术研究完全开放，开发者也仅需邮件申请并获得官方商用许可后，即可以免费商用。
 Baichuan-13B-Chat is the aligned version in the Baichuan-13B series of models, and the pre-trained model can be found at [Baichuan-13B-Base](https://github.com/baichuan-inc/Baichuan-13B-Base).
 ## 测评结果
+## [C-Eval](https://cevalbenchmark.com/index.html#home)
 | Model 5-shot            | STEM  | Social Sciences | Humanities | Others | Average |
+|-------------------------|:-----:|:---------------:|:----------:|:------:|:-------:|
 | Baichuan-7B             | 38.2  | 52.0            | 46.2       | 39.3   | 42.8    |
+| Chinese-Alpaca-Plus-13B | 35.2  | 45.6            | 40.0       | 38.2   | 38.8    |
+| Chinese-LLaMA-Plus-13B  | 30.3  | 38.0            | 32.9       | 29.1   | 32.1    |
 | Ziya-LLaMA-13B-Pretrain | 27.6  | 34.4            | 32.0       | 28.6   | 30.0    |
 | LLaMA-13B               | 27.0  | 33.6            | 27.7       | 27.6   | 28.5    |
 | moss-moon-003-base (16B)| 27.0  | 29.1            | 27.2       | 26.9   | 27.4    |
 | vicuna-13B              | 22.8  | 24.8            | 22.3       | 18.5   | 22.2    |
 | **Baichuan-13B-Base**   | **45.9** | **63.5** | **57.2**    | **49.3** | **52.4** |
 | **Baichuan-13B-Chat**   | **43.7** | **64.6** | **56.2**    | **49.2** | **51.5** |
+## [MMLU](https://arxiv.org/abs/2009.03300)
 | Model 5-shot            | STEM  | Social Sciences | Humanities | Others | Average |
+|-------------------------|:-----:|:---------------:|:----------:|:------:|:-------:|
 | LLaMA-13B               | 36.1  | 53.0            | 44.0       | 52.8   | 46.3    |
+| Chinese-Alpaca-Plus-13B | 36.9  | 48.9            | 40.5       | 50.5   | 43.9    |
 | Ziya-LLaMA-13B-Pretrain | 35.6  | 47.6            | 40.1       | 49.4   | 42.9    |
 | Baichuan-7B             | 35.6  | 48.9            | 38.4       | 48.1   | 42.3    |
+| Chinese-LLaMA-Plus-13B  | 33.1  | 42.8            | 37.0       | 44.6   | 39.2    |
 | vicuna-13B              | 24.2  | 24.1            | 24.6       | 26.8   | 24.9    |
 | moss-moon-003-base (16B)| 22.4  | 22.8            | 24.2       | 24.4   | 23.6    |
 | **Baichuan-13B-Base**   | **41.6** | **60.9** | **47.4**    | **58.5** | **51.6** |
 | **Baichuan-13B-Chat**   | **40.9** | **60.9** | **48.8**    | **59.0** | **52.1** |
+> 说明：我们采用了 MMLU 官方的[评测方案](https://github.com/hendrycks/test)。
+## [CMMLU](https://github.com/haonan-li/CMMLU)
 | Model 5-shot            | STEM  | Humanities | Social Sciences | Others | China Specific | Average |
+|-------------------------|:-----:|:----------:|:---------------:|:------:|:--------------:|:-------:|
 | Baichuan-7B             | 34.4  | 47.5       | 47.6            | 46.6   | 44.3           | 44.0    |
+| Chinese-Alpaca-Plus-13B | 29.8  | 33.4       | 33.2            | 37.9   | 32.1           | 33.4    |
+| Chinese-LLaMA-Plus-13B  | 28.1  | 33.1       | 35.4            | 35.1   | 33.5           | 33.0    |
 | Ziya-LLaMA-13B-Pretrain | 29.0  | 30.7       | 33.8            | 34.4   | 31.9           | 32.1    |
 | LLaMA-13B               | 29.2  | 30.8       | 31.6            | 33.0   | 30.5           | 31.2    |
 | moss-moon-003-base (16B)| 27.2  | 30.4       | 28.8            | 32.6   | 28.7           | 29.6    |
 | vicuna-13B              | 24.0  | 25.4       | 25.3            | 25.0   | 25.0           | 24.9    |
 | **Baichuan-13B-Base**   | **41.7** | **61.1** | **59.8** | **59.0**          | **56.4** | **55.3** |
 | **Baichuan-13B-Chat**   | **42.8** | **62.6** | **59.7** | **59.0**          | **56.1** | **55.8** |
+> 说明：CMMLU 是一个综合性的中文评估基准，专门用于评估语言模型在中文语境下的知识和推理能力。我们采用了其官方的[评测方案](https://github.com/haonan-li/CMMLU)。
 ## 微信群组
 ![WeChat](https://github.com/baichuan-inc/Baichuan-13B/blob/main/media/wechat.jpeg?raw=true)