Baichuan-7B-Instruction

介绍

Baichuan-7B-Instruction 为 Baichuan-7B 系列模型进行指令微调后的版本，预训练模型可见 Baichuan-7B。

Demo

如下是一个使用 gradio 的模型 demo

import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction",trust_remote_code=True,use_fast=False)
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction",trust_remote_code=True ).half()
model.cuda()

def generate(histories,  max_new_tokens=2048, do_sample = True, top_p = 0.95, temperature = 0.35, repetition_penalty=1.1):
    prompt = ""
    for history in histories:
        history_with_identity = "\nHuman:" + history[0] + "\n\nAssistant:" + history[1]
        prompt += history_with_identity
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
    outputs = model.generate(
                    input_ids = input_ids,
                    max_new_tokens=max_new_tokens,
                    early_stopping=True,
                    do_sample=do_sample,
                    top_p=top_p, 
                    temperature=temperature,
                    repetition_penalty=repetition_penalty,
        )
    rets = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    generate_text = rets[0].replace(prompt, "")
    return generate_text
    
with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.Button("clear")

    def user(user_message, history):
        return "", history + [[user_message, ""]]

    def bot(history):
        print(history)
        bot_message = generate(history)
        history[-1][1] = bot_message
        return history

    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
        bot, chatbot, chatbot
    )
    clear.click(lambda: None, None, chatbot, queue=False)

if __name__ == "__main__":
    demo.launch(server_name="0.0.0.0")

量化部署

Baichuan-7B 支持 int8 和 int4 量化，用户只需在推理代码中简单修改两行即可实现。请注意，如果是为了节省显存而进行量化，应加载原始精度模型到 CPU 后再开始量化；避免在 from_pretrained 时添加 device_map='auto' 或者其它会导致把原始精度模型直接加载到 GPU 的行为的参数。

使用 int8 量化 (To use int8 quantization):

model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda()

同样的，如需使用 int4 量化 (Similarly, to use int4 quantization):

model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(4).cuda()

训练详情

数据集：https://huggingface.co/datasets/shareAI/ShareGPT-Chinese-English-90k。

硬件：8*A40

测评结果

CMMLU

Model 5-shot	STEM	Humanities	Social Sciences	Others	China Specific	Average
Baichuan-7B	34.4	47.5	47.6	46.6	44.3	44.0
Vicuna-13B	31.8	36.2	37.6	39.5	34.3	36.3
Chinese-Alpaca-Plus-13B	29.8	33.4	33.2	37.9	32.1	33.4
Chinese-LLaMA-Plus-13B	28.1	33.1	35.4	35.1	33.5	33.0
Ziya-LLaMA-13B-Pretrain	29.0	30.7	33.8	34.4	31.9	32.1
LLaMA-13B	29.2	30.8	31.6	33.0	30.5	31.2
moss-moon-003-base (16B)	27.2	30.4	28.8	32.6	28.7	29.6
Baichuan-13B-Base	41.7	61.1	59.8	59.0	56.4	55.3
Baichuan-13B-Chat	42.8	62.6	59.7	59.0	56.1	55.8
Baichuan-13B-Instruction	44.50	61.16	59.07	58.34	55.55	55.61
Baichuan-7B-Instruction	34.68	47.38	47.13	45.11	44.51	43.57

Model zero-shot	STEM	Humanities	Social Sciences	Others	China Specific	Average
ChatGLM2-6B	41.28	52.85	53.37	52.24	50.58	49.95
Baichuan-7B	32.79	44.43	46.78	44.79	43.11	42.33
ChatGLM-6B	32.22	42.91	44.81	42.60	41.93	40.79
BatGPT-15B	33.72	36.53	38.07	46.94	38.32	38.51
Chinese-LLaMA-7B	26.76	26.57	27.42	28.33	26.73	27.34
MOSS-SFT-16B	25.68	26.35	27.21	27.92	26.70	26.88
Chinese-GLM-10B	25.57	25.01	26.33	25.94	25.81	25.80
Baichuan-13B	42.04	60.49	59.55	56.60	55.72	54.63
Baichuan-13B-Chat	37.32	56.24	54.79	54.07	52.23	50.48
Baichuan-13B-Instruction	42.56	62.09	60.41	58.97	56.95	55.88
Baichuan-7B-Instruction	33.94	46.31	47.73	45.84	44.88	43.53

说明：CMMLU 是一个综合性的中文评估基准，专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的评测脚本对模型进行评测。Model zero-shot 表格中 Baichuan-13B-Chat 的得分来自我们直接运行 CMMLU 官方的评测脚本得到，其他模型的的得分来自于 CMMLU 官方的评测结果.

英文能力评测

除了中文榜单的测试，我们同样测试了模型在英文榜单 MMLU 上的能力。

MMLU

MMLU 是一个包含了57种任务的英文评测数据集。我们采用了开源的评测方案 , 评测结果如下:

Model	Humanities	Social Sciences	STEM	Other	Average
LLaMA-7B²	34.0	38.3	30.5	38.1	35.1
Falcon-7B¹	-	-	-	-	35.0
mpt-7B¹	-	-	-	-	35.6
ChatGLM-6B⁰	35.4	41.0	31.3	40.5	36.9
BLOOM 7B⁰	25.0	24.4	26.5	26.4	25.5
BLOOMZ 7B⁰	31.3	42.1	34.4	39.0	36.1
moss-moon-003-base (16B)⁰	24.2	22.8	22.4	24.4	23.6
moss-moon-003-sft (16B)⁰	30.5	33.8	29.3	34.4	31.9
Baichuan-7B⁰	38.4	48.9	35.6	48.1	42.3
Baichuan-7B-Instruction(5-shot)	38.9	49.0	35.3	48.8	42.6
Baichuan-7B-Instruction(0-shot)	38.7	47.9	34.5	48.2	42.0