baichuan-inc
/

Baichuan-13B-Chat

@@ -10,23 +10,21 @@ inference: false
 <!-- Provide a quick summary of what the model is/does. -->
 ## 介绍
-Baichuan-13B-Chat 是由百川智能继 [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 之后开发的包含 130 亿参数的开源可商用的大规模语言模型，在标准的中文和英文 benchmark上均取得同尺寸最好的效果。本次发布包含有预训练 (Baichuan-13B-Base) 和对齐 (Baichuan-13B-Chat) 两个版本。Baichuan-13B 有如下几个特点：
-1. **开源可商用百亿级别中文语言模型**：Baichuan-13B-Base 是免费开源可商用的百亿级别中文预训练语言模型。包含有130亿参数，没有经过任何 Instruction Tuning 或者针对 benchmark 的优化，纯净、高可定制。弥补了在中文领域缺乏 100 亿以上高可用中文预训练大模型的短板。
-2. **更大尺寸、更多数据**：在 Baichuan-7B 的基础上进一步扩大参数量到 130 亿，并且在高质量的语料上训练了 1.4 万亿 tokens，是当前开源 13B 尺寸下训练数据量最多的模型。支持中英双语，使用 [ALiBi](https://arxiv.org/abs/2108.12409) 位置编码，上下文窗口长度为 4096。
-3. **同时开源预训练和对齐模型**：预训练模型是适用开发者的”基座“，而广大普通用户对有对话功能的对齐模型具有更强的需求。因此本次开源我们同时发布了对齐模型（Baichuan-13B-Chat），具有很强的对话能力，开箱即用，支持很简单的部署。
-4. **更高效的推理**：为了支持更广大用户的使用，我们本次同时开源了 int8 和 int4 的量化版本，在几乎没有效果损失的情况下可以很方便的将模型部署在低显存机器上。
 ## Introduction
-Baichuan-13B is an open-source, commercially available large-scale language model with 130 billion parameters developed by Baichuan Intelligence following [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B). It achieves the best performance in standard Chinese and English benchmarks of the same size. This release includes two versions: pre-training (Baichuan-13B-Base) and alignment (Baichuan-13B-Chat). Baichuan-13B has the following features:
-1. **Open-source, commercially available billion-level Chinese language model**: Baichuan-13B-Base is a free, open-source, commercially available billion-level Chinese pre-training language model. It contains 130 billion parameters, has not undergone any Instruction Tuning or optimization for benchmarks, and is pure and highly customizable. It fills the gap in the lack of over 10 billion high-availability Chinese pre-training large models in the Chinese field.
-2. **Larger size, more data**: On the basis of Baichuan-7B, the parameter volume is further expanded to 130 billion, and 1.4 trillion tokens have been trained on high-quality corpora, making it the model with the most training data in the open-source 13B size. It supports both Chinese and English, uses [ALiBi](https://arxiv.org/abs/2108.12409) position encoding, and has a context window length of 4096.
-3. **Open-source pre-training and alignment models simultaneously**: The pre-training model is a "base" suitable for developers, while the general public has a stronger demand for alignment models with dialogue capabilities. Therefore, in this open-source release, we have also released an alignment model (Baichuan-13B-Chat) which has strong dialogue capabilities, is ready to use, and supports simple deployment.
-4. **More efficient inference**: To support a wider range of users, we have also open-sourced the int8 and int4 quantized versions this time. With almost no loss of effect, the model can be easily deployed on low-memory machines.
 ## How to Get Started with the Model
@@ -53,7 +51,7 @@ tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", use_
 model = AutoModel.from_pretrained("baichuan-inc/Baichuan-13B-Chat", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
 model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan-13B-Chat")
 messages = []
-messages.append({"role": "user", "content": "The second highest mountain in the world is K2."})
 response = model.chat(tokenizer, messages)
 print(response)
 ```
@@ -182,5 +180,7 @@ We conducted a `5-shot` evaluation under various benchmarks, using the same meth
 | **Baichuan-13B-Base**   | **41.7** | **61.1** | **59.8** | **59.0**          | **56.4** | **55.3** |
 | **Baichuan-13B-Chat**   | **42.8** | **62.6** | **59.7** | **59.0**          | **56.1** | **55.8** |
 ## Our Group
 ![WeChat](https://github.com/baichuan-inc/baichuan-7B/blob/main/media/wechat.jpeg?raw=true)

 <!-- Provide a quick summary of what the model is/does. -->
 ## 介绍
+Baichuan-13B 是由百川智能继 [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 之后开发的包含 130 亿参数的开源可商用的大规模语言模型，在标准的中文和英文 benchmark上均取得同尺寸最好的效果。本次发布包含有预训练 (Baichuan-13B-Base) 和对齐 (Baichuan-13B-Chat) 两个版本。Baichuan-13B 有如下几个特点：
+  1. **更大尺寸、更多数据**：Baichuan-13B在[Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 的基础上进一步扩大参数量到130亿，并且在高质量的语料上训练了1.4万亿tokens，超过LLaMA-13B 40%，是当前开源13B尺寸下训练数据量最多的模型。支持中英双语，使用ALiBi 位置编码，上下文窗口长度为 4096。
+  2. **同时开源预训练和对齐模型**：预训练模型是适用开发者的”基座“，而广大普通用户对有对话功能的对齐模型具有更强的需求。因此本次开源我们同时发布了对齐模型（Baichuan-13B-Chat），具有很强的对话能力，开箱即用，几行代码即可简单的部署。
+  3. **更高效的推理**：为了支持更广大用户的使用，我们本次同时开源了INT8和INT4的量化版本，在几乎没有效果损失的情况下可以很方便的将模型部署在如3090等消费机显卡上。
+  4. **开源免费可商用**：Baichuan-13B不仅对学术研究完全开放，开发者也仅需邮件申请并获得官方商用许可后，即可以免费商用。
 ## Introduction
+Baichuan-13B is an open-source, commercially usable large-scale language model developed by Baichuan Intelligence, following [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B). With 13 billion parameters, it achieves the best performance in standard Chinese and English benchmarks among models of its size. This release includes two versions: pre-training (Baichuan-13B-Base) and alignment (Baichuan-13B-Chat). Baichuan-13B has the following features:
+  1. **Larger size, more data**: Baichuan-13B further expands the parameter volume to 13 billion based on [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B), and has trained 1.4 trillion tokens on high-quality corpora, exceeding LLaMA-13B by 40%. It is currently the model with the most training data in the open-source 13B size. It supports both Chinese and English, uses ALiBi position encoding, and has a context window length of 4096.
+  2. **Open-source pre-training and alignment models simultaneously**: The pre-training model is a "base" suitable for developers, while the general public has a stronger demand for alignment models with dialogue capabilities. Therefore, in this open-source release, we also released the alignment model (Baichuan-13B-Chat), which has strong dialogue capabilities and is ready to use. It can be easily deployed with just a few lines of code.
+  3. **More efficient inference**: To support a wider range of users, we have open-sourced the INT8 and INT4 quantized versions. The model can be conveniently deployed on consumer GPUs like the 3090 with almost no performance loss.
+  4. **Open-source, free, and commercially usable**: Baichuan-13B is not only fully open to academic research, but developers can also use it for free commercially after applying for and receiving official commercial permission via email.
 ## How to Get Started with the Model
 model = AutoModel.from_pretrained("baichuan-inc/Baichuan-13B-Chat", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
 model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan-13B-Chat")
 messages = []
+messages.append({"role": "user", "content": "Which moutain is the second highest one in the world?"})
 response = model.chat(tokenizer, messages)
 print(response)
 ```
 | **Baichuan-13B-Base**   | **41.7** | **61.1** | **59.8** | **59.0**          | **56.4** | **55.3** |
 | **Baichuan-13B-Chat**   | **42.8** | **62.6** | **59.7** | **59.0**          | **56.1** | **55.8** |
+> 说明：CMMLU是一个综合性的中文评估基准，专门用于评估语言模型在中文语境下的知识和推理能力。我们采用了其官方的[评测方案](https://github.com/haonan-li/CMMLU)。
 ## Our Group
 ![WeChat](https://github.com/baichuan-inc/baichuan-7B/blob/main/media/wechat.jpeg?raw=true)