Redflashing
commited on
Commit
•
e114a8f
1
Parent(s):
e36ebf2
Update README.md
Browse files
README.md
CHANGED
@@ -4,24 +4,20 @@ language:
|
|
4 |
- en
|
5 |
pipeline_tag: text-generation
|
6 |
inference: false
|
7 |
-
|
8 |
---
|
9 |
-
|
10 |
# Baichuan-13B-Instruction
|
11 |
|
12 |
-
![](
|
13 |
|
14 |
<!-- Provide a quick summary of what the model is/does. -->
|
15 |
|
16 |
## 介绍
|
17 |
-
|
18 |
Baichuan-13B-Instruction 为 Baichuan-13B 系列模型进行指令微调后的版本,预训练模型可见 [Baichuan-13B-Base](https://huggingface.co/baichuan-inc/Baichuan-13B-Base)。
|
19 |
|
20 |
|
21 |
## 使用方式
|
22 |
|
23 |
如下是一个使用Baichuan-13B-Chat进行对话的示例,正确输出为"乔戈里峰。世界第二高峰———乔戈里峰西方登山者称其为k2峰,海拔高度是8611米,位于喀喇昆仑山脉的中巴边境上"
|
24 |
-
|
25 |
```python
|
26 |
import torch
|
27 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
@@ -40,14 +36,12 @@ print(response)
|
|
40 |
Baichuan-13B 支持 int8 和 int4 量化,用户只需在推理代码中简单修改两行即可实现。请注意,如果是为了节省显存而进行量化,应加载原始精度模型到 CPU 后再开始量化;避免在 `from_pretrained` 时添加 `device_map='auto'` 或者其它会导致把原始精度模型直接加载到 GPU 的行为的参数。
|
41 |
|
42 |
使用 int8 量化 (To use int8 quantization):
|
43 |
-
|
44 |
```python
|
45 |
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
|
46 |
model = model.quantize(8).cuda()
|
47 |
```
|
48 |
|
49 |
同样的,如需使用 int4 量化 (Similarly, to use int4 quantization):
|
50 |
-
|
51 |
```python
|
52 |
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
|
53 |
model = model.quantize(4).cuda()
|
@@ -68,7 +62,6 @@ model = model.quantize(4).cuda()
|
|
68 |
| Baichuan-13B | 25.4 |
|
69 |
|
70 |
具体参数和见下表
|
71 |
-
|
72 |
| 模型名称 | 隐含层维度 | 层数 | 头数 | 词表大小 | 总参数量 | 训练数据(tokens) | 位置编码 | 最大长度 |
|
73 |
| ------------ | ---------- | ---- | ---- | -------- | -------------- | ------------------ | ----------------------------------------- | -------- |
|
74 |
| Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2万亿 | [RoPE](https://arxiv.org/abs/2104.09864) | 4,096 |
|
@@ -88,18 +81,18 @@ model = model.quantize(4).cuda()
|
|
88 |
|
89 |
## [CMMLU](https://github.com/haonan-li/CMMLU)
|
90 |
|
91 |
-
| Model 5-shot
|
92 |
-
|
|
93 |
-
| Baichuan-7B
|
94 |
-
| Vicuna-13B
|
95 |
-
| Chinese-Alpaca-Plus-13B
|
96 |
-
| Chinese-LLaMA-Plus-13B
|
97 |
-
| Ziya-LLaMA-13B-Pretrain
|
98 |
-
| LLaMA-13B
|
99 |
-
| moss-moon-003-base (16B)
|
100 |
-
| Baichuan-13B-Base
|
101 |
-
| Baichuan-13B-Chat
|
102 |
-
| **Baichuan-13B-Instruction**
|
103 |
|
104 |
| Model zero-shot | STEM | Humanities | Social Sciences | Others | China Specific | Average |
|
105 |
| ------------------------------------------------------------ | :-------: | :--------: | :-------------: | :-------: | :------------: | :-------: |
|
|
|
4 |
- en
|
5 |
pipeline_tag: text-generation
|
6 |
inference: false
|
|
|
7 |
---
|
|
|
8 |
# Baichuan-13B-Instruction
|
9 |
|
10 |
+
![](./alpachino.png)
|
11 |
|
12 |
<!-- Provide a quick summary of what the model is/does. -->
|
13 |
|
14 |
## 介绍
|
|
|
15 |
Baichuan-13B-Instruction 为 Baichuan-13B 系列模型进行指令微调后的版本,预训练模型可见 [Baichuan-13B-Base](https://huggingface.co/baichuan-inc/Baichuan-13B-Base)。
|
16 |
|
17 |
|
18 |
## 使用方式
|
19 |
|
20 |
如下是一个使用Baichuan-13B-Chat进行对话的示例,正确输出为"乔戈里峰。世界第二高峰———乔戈里峰西方登山者称其为k2峰,海拔高度是8611米,位于喀喇昆仑山脉的中巴边境上"
|
|
|
21 |
```python
|
22 |
import torch
|
23 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
36 |
Baichuan-13B 支持 int8 和 int4 量化,用户只需在推理代码中简单修改两行即可实现。请注意,如果是为了节省显存而进行量化,应加载原始精度模型到 CPU 后再开始量化;避免在 `from_pretrained` 时添加 `device_map='auto'` 或者其它会导致把原始精度模型直接加载到 GPU 的行为的参数。
|
37 |
|
38 |
使用 int8 量化 (To use int8 quantization):
|
|
|
39 |
```python
|
40 |
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
|
41 |
model = model.quantize(8).cuda()
|
42 |
```
|
43 |
|
44 |
同样的,如需使用 int4 量化 (Similarly, to use int4 quantization):
|
|
|
45 |
```python
|
46 |
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
|
47 |
model = model.quantize(4).cuda()
|
|
|
62 |
| Baichuan-13B | 25.4 |
|
63 |
|
64 |
具体参数和见下表
|
|
|
65 |
| 模型名称 | 隐含层维度 | 层数 | 头数 | 词表大小 | 总参数量 | 训练数据(tokens) | 位置编码 | 最大长度 |
|
66 |
| ------------ | ---------- | ---- | ---- | -------- | -------------- | ------------------ | ----------------------------------------- | -------- |
|
67 |
| Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2万亿 | [RoPE](https://arxiv.org/abs/2104.09864) | 4,096 |
|
|
|
81 |
|
82 |
## [CMMLU](https://github.com/haonan-li/CMMLU)
|
83 |
|
84 |
+
| Model 5-shot | STEM | Humanities | Social Sciences | Others | China Specific | Average |
|
85 |
+
| ---------------------------------------------------------- | :-------: | :--------: | :-------------: | :------: | :------------: | :------: |
|
86 |
+
| Baichuan-7B | 34.4 | 47.5 | 47.6 | 46.6 | 44.3 | 44.0 |
|
87 |
+
| Vicuna-13B | 31.8 | 36.2 | 37.6 | 39.5 | 34.3 | 36.3 |
|
88 |
+
| Chinese-Alpaca-Plus-13B | 29.8 | 33.4 | 33.2 | 37.9 | 32.1 | 33.4 |
|
89 |
+
| Chinese-LLaMA-Plus-13B | 28.1 | 33.1 | 35.4 | 35.1 | 33.5 | 33.0 |
|
90 |
+
| Ziya-LLaMA-13B-Pretrain | 29.0 | 30.7 | 33.8 | 34.4 | 31.9 | 32.1 |
|
91 |
+
| LLaMA-13B | 29.2 | 30.8 | 31.6 | 33.0 | 30.5 | 31.2 |
|
92 |
+
| moss-moon-003-base (16B) | 27.2 | 30.4 | 28.8 | 32.6 | 28.7 | 29.6 |
|
93 |
+
| Baichuan-13B-Base | 41.7 | 61.1 | 59.8 | 59.0 | 56.4 | 55.3 |
|
94 |
+
| Baichuan-13B-Chat | 42.8 | **62.6** | **59.7** | **59.0** | **56.1** | **55.8** |
|
95 |
+
| **Baichuan-13B-Instruction** | **44.50** | 61.16 | 59.07 | 58.34 | 55.55 | 55.61 |
|
96 |
|
97 |
| Model zero-shot | STEM | Humanities | Social Sciences | Others | China Specific | Average |
|
98 |
| ------------------------------------------------------------ | :-------: | :--------: | :-------------: | :-------: | :------------: | :-------: |
|