Baichuan-13B-Instruction

介绍

Baichuan-13B-Instruction 为 Baichuan-13B 系列模型进行指令微调后的版本，预训练模型可见 Baichuan-13B-Base。

Demo

如下是一个使用 gradio 的模型 demo

import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction",trust_remote_code=True,use_fast=False)
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction",trust_remote_code=True ).half()
model.cuda()

def generate(histories,  max_new_tokens=2048, do_sample = True, top_p = 0.95, temperature = 0.35, repetition_penalty=1.1):
    prompt = ""
    for history in histories:
        history_with_identity = "\nHuman:" + history[0] + "\n\nAssistant:" + history[1]
        prompt += history_with_identity
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
    outputs = model.generate(
                    input_ids = input_ids,
                    max_new_tokens=max_new_tokens,
                    early_stopping=True,
                    do_sample=do_sample,
                    top_p=top_p, 
                    temperature=temperature,
                    repetition_penalty=repetition_penalty,
        )
    rets = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    generate_text = rets[0].replace(prompt, "")
    return generate_text
    
with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.Button("clear")

    def user(user_message, history):
        return "", history + [[user_message, ""]]

    def bot(history):
        print(history)
        bot_message = generate(history)
        history[-1][1] = bot_message
        return history

    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
        bot, chatbot, chatbot
    )
    clear.click(lambda: None, None, chatbot, queue=False)

if __name__ == "__main__":
    demo.launch(server_name="0.0.0.0")

量化部署

Baichuan-13B 支持 int8 和 int4 量化，用户只需在推理代码中简单修改两行即可实现。请注意，如果是为了节省显存而进行量化，应加载原始精度模型到 CPU 后再开始量化；避免在 from_pretrained 时添加 device_map='auto' 或者其它会导致把原始精度模型直接加载到 GPU 的行为的参数。

使用 int8 量化 (To use int8 quantization):

model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda()

同样的，如需使用 int4 量化 (Similarly, to use int4 quantization):

model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(4).cuda()

模型详情

模型结构

整体模型基于Baichuan-13B，为了获得更好的推理性能，Baichuan-13B 使用了 ALiBi 线性偏置技术，相对于 Rotary Embedding 计算量更小，对推理性能有显著提升；与标准的 LLaMA-13B 相比，生成 2000 个 tokens 的平均推理速度 (tokens/s)，实测提升 31.6%：

Model	tokens/s
LLaMA-13B	19.4
Baichuan-13B	25.4

具体参数和见下表

模型名称	隐含层维度	层数	头数	词表大小	总参数量	训练数据（tokens）	位置编码	最大长度
Baichuan-7B	4,096	32	32	64,000	7,000,559,616	1.2万亿	RoPE	4,096
Baichuan-13B	5,120	40	40	64,000	13,264,901,120	1.4万亿	ALiBi	4,096

训练详情

数据集主要由三部分组成：

在 sharegpt_zh 数据集中筛选的出 13k 高质量数据。
lima
按照任务类型挑选的 2.3k 高质量中文数据集，每个任务类型的数据量在 100 条左右。

硬件：8*A40

测评结果

CMMLU

Model 5-shot	STEM	Humanities	Social Sciences	Others	China Specific	Average
Baichuan-7B	34.4	47.5	47.6	46.6	44.3	44.0
Vicuna-13B	31.8	36.2	37.6	39.5	34.3	36.3
Chinese-Alpaca-Plus-13B	29.8	33.4	33.2	37.9	32.1	33.4
Chinese-LLaMA-Plus-13B	28.1	33.1	35.4	35.1	33.5	33.0
Ziya-LLaMA-13B-Pretrain	29.0	30.7	33.8	34.4	31.9	32.1
LLaMA-13B	29.2	30.8	31.6	33.0	30.5	31.2
moss-moon-003-base (16B)	27.2	30.4	28.8	32.6	28.7	29.6
Baichuan-13B-Base	41.7	61.1	59.8	59.0	56.4	55.3
Baichuan-13B-Chat	42.8	62.6	59.7	59.0	56.1	55.8
Baichuan-13B-Instruction	44.50	61.16	59.07	58.34	55.55	55.61

Model zero-shot	STEM	Humanities	Social Sciences	Others	China Specific	Average
ChatGLM2-6B	41.28	52.85	53.37	52.24	50.58	49.95
Baichuan-7B	32.79	44.43	46.78	44.79	43.11	42.33
ChatGLM-6B	32.22	42.91	44.81	42.60	41.93	40.79
BatGPT-15B	33.72	36.53	38.07	46.94	38.32	38.51
Chinese-LLaMA-13B	26.76	26.57	27.42	28.33	26.73	27.34
MOSS-SFT-16B	25.68	26.35	27.21	27.92	26.70	26.88
Chinese-GLM-10B	25.57	25.01	26.33	25.94	25.81	25.80
Baichuan-13B	42.04	60.49	59.55	56.60	55.72	54.63
Baichuan-13B-Chat	37.32	56.24	54.79	54.07	52.23	50.48
Baichuan-13B-Instruction	42.56	62.09	60.41	58.97	56.95	55.88

说明：CMMLU 是一个综合性的中文评估基准，专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的评测脚本对模型进行评测。Model zero-shot 表格中 Baichuan-13B-Chat 的得分来自我们直接运行 CMMLU 官方的评测脚本得到，其他模型的的得分来自于 CMMLU 官方的评测结果，Model 5-shot 中其他模型的得分来自于Baichuan-13B 官方的评测结果。