File size: 8,866 Bytes
6bae93b
 
 
88683ce
 
 
 
 
 
e114a8f
88683ce
 
 
 
 
 
 
4670ca1
88683ce
dd92a44
88683ce
4670ca1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88683ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e114a8f
 
 
 
 
 
 
 
 
 
 
 
88683ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
language:
- zh
- en
pipeline_tag: text-generation
inference: false
---
# Baichuan-13B-Instruction

![](./alpachino.png)

<!-- Provide a quick summary of what the model is/does. -->

## 介绍
Baichuan-13B-Instruction 为 Baichuan-13B 系列模型进行指令微调后的版本,预训练模型可见 [Baichuan-13B-Base](https://huggingface.co/baichuan-inc/Baichuan-13B-Base)。


## Demo

如下是一个使用 gradio 的模型 demo
```python
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction",trust_remote_code=True,use_fast=False)
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction",trust_remote_code=True ).half()
model.cuda()

def generate(histories,  max_new_tokens=2048, do_sample = True, top_p = 0.95, temperature = 0.35, repetition_penalty=1.1):
    prompt = ""
    for history in histories:
        history_with_identity = "\nHuman:" + history[0] + "\n\nAssistant:" + history[1]
        prompt += history_with_identity
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
    outputs = model.generate(
                    input_ids = input_ids,
                    max_new_tokens=max_new_tokens,
                    early_stopping=True,
                    do_sample=do_sample,
                    top_p=top_p, 
                    temperature=temperature,
                    repetition_penalty=repetition_penalty,
        )
    rets = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    generate_text = rets[0].replace(prompt, "")
    return generate_text
    
with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.Button("clear")

    def user(user_message, history):
        return "", history + [[user_message, ""]]

    def bot(history):
        print(history)
        bot_message = generate(history)
        history[-1][1] = bot_message
        return history

    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
        bot, chatbot, chatbot
    )
    clear.click(lambda: None, None, chatbot, queue=False)

if __name__ == "__main__":
    demo.launch(server_name="0.0.0.0")



```

## 量化部署

Baichuan-13B 支持 int8 和 int4 量化,用户只需在推理代码中简单修改两行即可实现。请注意,如果是为了节省显存而进行量化,应加载原始精度模型到 CPU 后再开始量化;避免在 `from_pretrained` 时添加 `device_map='auto'` 或者其它会导致把原始精度模型直接加载到 GPU 的行为的参数。

使用 int8 量化 (To use int8 quantization):
```python
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda() 
```

同样的,如需使用 int4 量化 (Similarly, to use int4 quantization):
```python
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(4).cuda()
```

## 模型详情


### 模型结构

<!-- Provide the basic links for the model. -->

整体模型基于Baichuan-13B,为了获得更好的推理性能,Baichuan-13B 使用了 ALiBi 线性偏置技术,相对于 Rotary Embedding 计算量更小,对推理性能有显著提升;与标准的 LLaMA-13B 相比,生成 2000 个 tokens 的平均推理速度 (tokens/s),实测提升 31.6%:

| Model        | tokens/s |
| ------------ | -------- |
| LLaMA-13B    | 19.4     |
| Baichuan-13B | 25.4     |

具体参数和见下表
| 模型名称     | 隐含层维度 | 层数 | 头数 | 词表大小 | 总参数量       | 训练数据(tokens) | 位置编码                                  | 最大长度 |
| ------------ | ---------- | ---- | ---- | -------- | -------------- | ------------------ | ----------------------------------------- | -------- |
| Baichuan-7B  | 4,096      | 32   | 32   | 64,000   | 7,000,559,616  | 1.2万亿            | [RoPE](https://arxiv.org/abs/2104.09864)  | 4,096    |
| Baichuan-13B | 5,120      | 40   | 40   | 64,000   | 13,264,901,120 | 1.4万亿            | [ALiBi](https://arxiv.org/abs/2108.12409) | 4,096    |

## 训练详情

数据集主要由三部分组成:

* 在 [sharegpt_zh](https://huggingface.co/datasets/QingyiSi/Alpaca-CoT/tree/main/ShareGPT) 数据集中筛选的出 13k 高质量数据。
* [lima](https://huggingface.co/datasets/GAIR/lima)
* 按照任务类型挑选的 2.3k 高质量中文数据集,每个任务类型的数据量在 100 条左右。

硬件:8*A40

## 测评结果

## [CMMLU](https://github.com/haonan-li/CMMLU)

| Model 5-shot                                               |   STEM    | Humanities | Social Sciences |  Others  | China Specific | Average  |
| ---------------------------------------------------------- | :-------: | :--------: | :-------------: | :------: | :------------: | :------: |
| Baichuan-7B |   34.4    |    47.5    |      47.6       |   46.6   |      44.3      |   44.0   |
| Vicuna-13B                                                 |   31.8    |    36.2    |      37.6       |   39.5   |      34.3      |   36.3   |
| Chinese-Alpaca-Plus-13B                                    |   29.8    |    33.4    |      33.2       |   37.9   |      32.1      |   33.4   |
| Chinese-LLaMA-Plus-13B                                     |   28.1    |    33.1    |      35.4       |   35.1   |      33.5      |   33.0   |
| Ziya-LLaMA-13B-Pretrain                                    |   29.0    |    30.7    |      33.8       |   34.4   |      31.9      |   32.1   |
| LLaMA-13B                                                  |   29.2    |    30.8    |      31.6       |   33.0   |      30.5      |   31.2   |
| moss-moon-003-base (16B)                                   |   27.2    |    30.4    |      28.8       |   32.6   |      28.7      |   29.6   |
| Baichuan-13B-Base                                          |   41.7    |    61.1    |      59.8       |   59.0   |      56.4      |   55.3   |
| Baichuan-13B-Chat                                          |   42.8    |  **62.6**  |    **59.7**     | **59.0** |    **56.1**    | **55.8** |
| **Baichuan-13B-Instruction**                               | **44.50** |   61.16    |      59.07      |  58.34   |     55.55      |  55.61   |

| Model zero-shot                                              |   STEM    | Humanities | Social Sciences |  Others   | China Specific |  Average  |
| ------------------------------------------------------------ | :-------: | :--------: | :-------------: | :-------: | :------------: | :-------: |
| [ChatGLM2-6B](https://huggingface.co/THUDM/chatglm2-6b)      |   41.28   |   52.85    |      53.37      |   52.24   |     50.58      |   49.95   |
| [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B)   |   32.79   |   44.43    |      46.78      |   44.79   |     43.11      |   42.33   |
| [ChatGLM-6B](https://github.com/THUDM/GLM-130B)              |   32.22   |   42.91    |      44.81      |   42.60   |     41.93      |   40.79   |
| [BatGPT-15B](https://arxiv.org/abs/2307.00360)               |   33.72   |   36.53    |      38.07      |   46.94   |     38.32      |   38.51   |
| [Chinese-LLaMA-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) |   26.76   |   26.57    |      27.42      |   28.33   |     26.73      |   27.34   |
| [MOSS-SFT-16B](https://github.com/OpenLMLab/MOSS)            |   25.68   |   26.35    |      27.21      |   27.92   |     26.70      |   26.88   |
| [Chinese-GLM-10B](https://github.com/THUDM/GLM)              |   25.57   |   25.01    |      26.33      |   25.94   |     25.81      |   25.80   |
| [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) |   42.04   |   60.49    |      59.55      |   56.60   |     55.72      |   54.63   |
| [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-13B) |   37.32   |   56.24    |      54.79      |   54.07   |     52.23      |   50.48   |
| **Baichuan-13B-Instruction**                                 | **42.56** | **62.09**  |    **60.41**    | **58.97** |   **56.95**    | **55.88** |

> 说明:CMMLU 是一个综合性的中文评估基准,专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的[评测脚本](https://github.com/haonan-li/CMMLU)对模型进行评测。Model zero-shot 表格中 [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-13B) 的得分来自我们直接运行 CMMLU 官方的评测脚本得到,其他模型的的得分来自于 [CMMLU](https://github.com/haonan-li/CMMLU/tree/master) 官方的评测结果,Model 5-shot 中其他模型的得分来自于[Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) 官方的评测结果。