File size: 9,201 Bytes
1d56ab5
f4231fe
 
 
 
 
 
1d56ab5
f4231fe
 
 
 
 
 
 
 
 
28d8d00
f4231fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67b23de
 
 
 
 
 
 
 
 
 
 
 
 
 
f4231fe
 
 
 
 
 
 
 
 
 
 
67b23de
f4231fe
 
 
 
67b23de
 
829a3db
 
 
67b23de
 
829a3db
 
 
 
 
 
 
 
 
 
 
 
67b23de
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
language:
- zh
- en
pipeline_tag: text-generation
inference: false

---

# Baichuan-7B-Instruction

![](./alpachino.png)

<!-- Provide a quick summary of what the model is/does. -->

## 介绍

Baichuan-7B-Instruction 为 Baichuan-7B 系列模型进行指令微调后的版本,预训练模型可见 [Baichuan-7B](https://huggingface.co/baichuan-inc/Baichuan-7B)。


## Demo

如下是一个使用 gradio 的模型 demo

```python
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction",trust_remote_code=True,use_fast=False)
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction",trust_remote_code=True ).half()
model.cuda()

def generate(histories,  max_new_tokens=2048, do_sample = True, top_p = 0.95, temperature = 0.35, repetition_penalty=1.1):
    prompt = ""
    for history in histories:
        history_with_identity = "\nHuman:" + history[0] + "\n\nAssistant:" + history[1]
        prompt += history_with_identity
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
    outputs = model.generate(
                    input_ids = input_ids,
                    max_new_tokens=max_new_tokens,
                    early_stopping=True,
                    do_sample=do_sample,
                    top_p=top_p, 
                    temperature=temperature,
                    repetition_penalty=repetition_penalty,
        )
    rets = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    generate_text = rets[0].replace(prompt, "")
    return generate_text
    
with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.Button("clear")

    def user(user_message, history):
        return "", history + [[user_message, ""]]

    def bot(history):
        print(history)
        bot_message = generate(history)
        history[-1][1] = bot_message
        return history

    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
        bot, chatbot, chatbot
    )
    clear.click(lambda: None, None, chatbot, queue=False)

if __name__ == "__main__":
    demo.launch(server_name="0.0.0.0")



```

## 量化部署

Baichuan-7B 支持 int8 和 int4 量化,用户只需在推理代码中简单修改两行即可实现。请注意,如果是为了节省显存而进行量化,应加载原始精度模型到 CPU 后再开始量化;避免在 `from_pretrained` 时添加 `device_map='auto'` 或者其它会导致把原始精度模型直接加载到 GPU 的行为的参数。

使用 int8 量化 (To use int8 quantization):

```python
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda() 
```

同样的,如需使用 int4 量化 (Similarly, to use int4 quantization):

```python
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(4).cuda()
```

## 训练详情

数据集:https://huggingface.co/datasets/shareAI/ShareGPT-Chinese-English-90k。

硬件:8*A40

## 测评结果

## [CMMLU](https://github.com/haonan-li/CMMLU)

| Model 5-shot                                               |   STEM    | Humanities | Social Sciences |  Others  | China Specific | Average  |
| ---------------------------------------------------------- | :-------: | :--------: | :-------------: | :------: | :------------: | :------: |
| Baichuan-7B |   34.4    |    47.5    |      47.6       |   46.6   |      44.3      |   44.0   |
| Vicuna-13B                                                 |   31.8    |    36.2    |      37.6       |   39.5   |      34.3      |   36.3   |
| Chinese-Alpaca-Plus-13B                                    |   29.8    |    33.4    |      33.2       |   37.9   |      32.1      |   33.4   |
| Chinese-LLaMA-Plus-13B                                     |   28.1    |    33.1    |      35.4       |   35.1   |      33.5      |   33.0   |
| Ziya-LLaMA-13B-Pretrain                                    |   29.0    |    30.7    |      33.8       |   34.4   |      31.9      |   32.1   |
| LLaMA-13B                                                  |   29.2    |    30.8    |      31.6       |   33.0   |      30.5      |   31.2   |
| moss-moon-003-base (16B)                                   |   27.2    |    30.4    |      28.8       |   32.6   |      28.7      |   29.6   |
| Baichuan-13B-Base                                          |   41.7    |    61.1    |      59.8       |   59.0   |      56.4      |   55.3   |
| Baichuan-13B-Chat                                          |   42.8    |  62.6  |    59.7  | 59.0 |    56.1    | 55.8 |
| Baichuan-13B-Instruction                              | 44.50 |   61.16    |      59.07      |  58.34   |     55.55      |  55.61   |
| **Baichuan-7B-Instruction**                                  | **34.68** | **47.38**  |    **47.13**    | **45.11** |   **44.51**    | **43.57** |

| Model zero-shot                                              |   STEM    | Humanities | Social Sciences |  Others   | China Specific |  Average  |
| ------------------------------------------------------------ | :-------: | :--------: | :-------------: | :-------: | :------------: | :-------: |
| [ChatGLM2-6B](https://huggingface.co/THUDM/chatglm2-6b)      |   41.28   |   52.85    |      53.37      |   52.24   |     50.58      |   49.95   |
| [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B)   |   32.79   |   44.43    |      46.78      |   44.79   |     43.11      |   42.33   |
| [ChatGLM-6B](https://github.com/THUDM/GLM-130B)              |   32.22   |   42.91    |      44.81      |   42.60   |     41.93      |   40.79   |
| [BatGPT-15B](https://arxiv.org/abs/2307.00360)               |   33.72   |   36.53    |      38.07      |   46.94   |     38.32      |   38.51   |
| [Chinese-LLaMA-7B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) |   26.76   |   26.57    |      27.42      |   28.33   |     26.73      |   27.34   |
| [MOSS-SFT-16B](https://github.com/OpenLMLab/MOSS)            |   25.68   |   26.35    |      27.21      |   27.92   |     26.70      |   26.88   |
| [Chinese-GLM-10B](https://github.com/THUDM/GLM)              |   25.57   |   25.01    |      26.33      |   25.94   |     25.81      |   25.80   |
| [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-7B)  |   42.04   |   60.49    |      59.55      |   56.60   |     55.72      |   54.63   |
| [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-7B) |   37.32   |   56.24    |      54.79      |   54.07   |     52.23      |   50.48   |
| Baichuan-13B-Instruction                                 | 42.56 | 62.09  |    60.41   | 58.97 |   56.95    | 55.88 |
| **Baichuan-7B-Instruction**                                  | **33.94** | **46.31**  |    **47.73**    | **45.84** |   **44.88**    | **43.53** |

> 说明:CMMLU 是一个综合性的中文评估基准,专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的[评测脚本](https://github.com/haonan-li/CMMLU)对模型进行评测。Model zero-shot 表格中 [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-13B) 的得分来自我们直接运行 CMMLU 官方的评测脚本得到,其他模型的的得分来自于 [CMMLU](https://github.com/haonan-li/CMMLU/tree/master) 官方的评测结果.

### 英文能力评测
除了中文榜单的测试,我们同样测试了模型在英文榜单 MMLU 上的能力。

#### MMLU

[MMLU](https://arxiv.org/abs/2009.03300) 是一个包含了57种任务的英文评测数据集。
我们采用了开源的[评测方案]((https://github.com/hendrycks/test)) , 评测结果如下:

| Model                                  | Humanities | Social Sciences | STEM | Other | Average |
|----------------------------------------|-----------:|:---------------:|:----:|:-----:|:-------:|
| LLaMA-7B<sup>2</sup>                   |       34.0 |      38.3       | 30.5 | 38.1  |  35.1   |
| Falcon-7B<sup>1</sup>                  |          - |        -        |  -   |   -   |  35.0   |
| mpt-7B<sup>1</sup>                     |          - |        -        |  -   |   -   |  35.6   |
| ChatGLM-6B<sup>0</sup>                 |       35.4 |      41.0       | 31.3 | 40.5  |  36.9   |
| BLOOM 7B<sup>0</sup>                   |       25.0 |      24.4       | 26.5 | 26.4  |  25.5   |
| BLOOMZ 7B<sup>0</sup>                  |       31.3 |      42.1       | 34.4 | 39.0  |  36.1   |
| moss-moon-003-base (16B)<sup>0</sup>   |       24.2 |      22.8       | 22.4 | 24.4  |  23.6   |
| moss-moon-003-sft (16B)<sup>0</sup>    |       30.5 |      33.8       | 29.3 | 34.4  |  31.9   |
| Baichuan-7B<sup>0</sup>                |       38.4 |      48.9       | 35.6 | 48.1  |  42.3   |
| **Baichuan-7B-Instruction(5-shot)**            |       **38.9** |      **49.0**       | **35.3** | **48.8**  |  **42.6**   |
| **Baichuan-7B-Instruction(0-shot)**            |       **38.7** |      **47.9**       | **34.5** | **48.2**  |  **42.0**   |