Update README.md

742e716 about 1 year ago

9.2 kB

	---
	language:
	- zh
	- en
	pipeline_tag: text-generation
	inference: false

	---

	# Baichuan-7B-Instruction

	![](./alpachino.png)

	<!-- Provide a quick summary of what the model is/does. -->

	## 介绍

	Baichuan-7B-Instruction 为 Baichuan-7B 系列模型进行指令微调后的版本，预训练模型可见 [Baichuan-7B](https://huggingface.co/baichuan-inc/Baichuan-7B)。


	## Demo

	如下是一个使用 gradio 的模型 demo

	```python
	import gradio as gr
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction",trust_remote_code=True,use_fast=False)
	model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction",trust_remote_code=True ).half()
	model.cuda()

	def generate(histories, max_new_tokens=2048, do_sample = True, top_p = 0.95, temperature = 0.35, repetition_penalty=1.1):
	prompt = ""
	for history in histories:
	history_with_identity = "\nHuman:" + history[0] + "\n\nAssistant:" + history[1]
	prompt += history_with_identity
	input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
	outputs = model.generate(
	input_ids = input_ids,
	max_new_tokens=max_new_tokens,
	early_stopping=True,
	do_sample=do_sample,
	top_p=top_p,
	temperature=temperature,
	repetition_penalty=repetition_penalty,
	)
	rets = tokenizer.batch_decode(outputs, skip_special_tokens=True)
	generate_text = rets[0].replace(prompt, "")
	return generate_text

	with gr.Blocks() as demo:
	chatbot = gr.Chatbot()
	msg = gr.Textbox()
	clear = gr.Button("clear")

	def user(user_message, history):
	return "", history + [[user_message, ""]]

	def bot(history):
	print(history)
	bot_message = generate(history)
	history[-1][1] = bot_message
	return history

	msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
	bot, chatbot, chatbot
	)
	clear.click(lambda: None, None, chatbot, queue=False)

	if __name__ == "__main__":
	demo.launch(server_name="0.0.0.0")



	```

	## 量化部署

	Baichuan-7B 支持 int8 和 int4 量化，用户只需在推理代码中简单修改两行即可实现。请注意，如果是为了节省显存而进行量化，应加载原始精度模型到 CPU 后再开始量化；避免在 `from_pretrained` 时添加 `device_map='auto'` 或者其它会导致把原始精度模型直接加载到 GPU 的行为的参数。

	使用 int8 量化 (To use int8 quantization):

	```python
	model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
	model = model.quantize(8).cuda()
	```

	同样的，如需使用 int4 量化 (Similarly, to use int4 quantization):

	```python
	model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
	model = model.quantize(4).cuda()
	```

	## 训练详情

	数据集：https://huggingface.co/datasets/shareAI/ShareGPT-Chinese-English-90k。

	硬件：8*A40

	## 测评结果

	## [CMMLU](https://github.com/haonan-li/CMMLU)

	\| Model 5-shot \| STEM \| Humanities \| Social Sciences \| Others \| China Specific \| Average \|
	\| ---------------------------------------------------------- \| :-------: \| :--------: \| :-------------: \| :------: \| :------------: \| :------: \|
	\| Baichuan-7B \| 34.4 \| 47.5 \| 47.6 \| 46.6 \| 44.3 \| 44.0 \|
	\| Vicuna-13B \| 31.8 \| 36.2 \| 37.6 \| 39.5 \| 34.3 \| 36.3 \|
	\| Chinese-Alpaca-Plus-13B \| 29.8 \| 33.4 \| 33.2 \| 37.9 \| 32.1 \| 33.4 \|
	\| Chinese-LLaMA-Plus-13B \| 28.1 \| 33.1 \| 35.4 \| 35.1 \| 33.5 \| 33.0 \|
	\| Ziya-LLaMA-13B-Pretrain \| 29.0 \| 30.7 \| 33.8 \| 34.4 \| 31.9 \| 32.1 \|
	\| LLaMA-13B \| 29.2 \| 30.8 \| 31.6 \| 33.0 \| 30.5 \| 31.2 \|
	\| moss-moon-003-base (16B) \| 27.2 \| 30.4 \| 28.8 \| 32.6 \| 28.7 \| 29.6 \|
	\| Baichuan-13B-Base \| 41.7 \| 61.1 \| 59.8 \| 59.0 \| 56.4 \| 55.3 \|
	\| Baichuan-13B-Chat \| 42.8 \| 62.6 \| 59.7 \| 59.0 \| 56.1 \| 55.8 \|
	\| Baichuan-13B-Instruction \| 44.50 \| 61.16 \| 59.07 \| 58.34 \| 55.55 \| 55.61 \|
	\| Baichuan-7B-Instruction \| 34.68 \| 47.38 \| 47.13 \| 45.11 \| 44.51 \| 43.57 \|

	\| Model zero-shot \| STEM \| Humanities \| Social Sciences \| Others \| China Specific \| Average \|
	\| ------------------------------------------------------------ \| :-------: \| :--------: \| :-------------: \| :-------: \| :------------: \| :-------: \|
	\| [ChatGLM2-6B](https://huggingface.co/THUDM/chatglm2-6b) \| 41.28 \| 52.85 \| 53.37 \| 52.24 \| 50.58 \| 49.95 \|
	\| [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) \| 32.79 \| 44.43 \| 46.78 \| 44.79 \| 43.11 \| 42.33 \|
	\| [ChatGLM-6B](https://github.com/THUDM/GLM-130B) \| 32.22 \| 42.91 \| 44.81 \| 42.60 \| 41.93 \| 40.79 \|
	\| [BatGPT-15B](https://arxiv.org/abs/2307.00360) \| 33.72 \| 36.53 \| 38.07 \| 46.94 \| 38.32 \| 38.51 \|
	\| [Chinese-LLaMA-7B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) \| 26.76 \| 26.57 \| 27.42 \| 28.33 \| 26.73 \| 27.34 \|
	\| [MOSS-SFT-16B](https://github.com/OpenLMLab/MOSS) \| 25.68 \| 26.35 \| 27.21 \| 27.92 \| 26.70 \| 26.88 \|
	\| [Chinese-GLM-10B](https://github.com/THUDM/GLM) \| 25.57 \| 25.01 \| 26.33 \| 25.94 \| 25.81 \| 25.80 \|
	\| [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-7B) \| 42.04 \| 60.49 \| 59.55 \| 56.60 \| 55.72 \| 54.63 \|
	\| [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-7B) \| 37.32 \| 56.24 \| 54.79 \| 54.07 \| 52.23 \| 50.48 \|
	\| Baichuan-13B-Instruction \| 42.56 \| 62.09 \| 60.41 \| 58.97 \| 56.95 \| 55.88 \|
	\| Baichuan-7B-Instruction \| 33.94 \| 46.31 \| 47.73 \| 45.84 \| 44.88 \| 43.53 \|

	> 说明：CMMLU 是一个综合性的中文评估基准，专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的[评测脚本](https://github.com/haonan-li/CMMLU)对模型进行评测。Model zero-shot 表格中 [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-13B) 的得分来自我们直接运行 CMMLU 官方的评测脚本得到，其他模型的的得分来自于 [CMMLU](https://github.com/haonan-li/CMMLU/tree/master) 官方的评测结果.

	### 英文能力评测
	除了中文榜单的测试，我们同样测试了模型在英文榜单 MMLU 上的能力。

	#### MMLU

	[MMLU](https://arxiv.org/abs/2009.03300) 是一个包含了57种任务的英文评测数据集。
	我们采用了开源的[评测方案]((https://github.com/hendrycks/test)) , 评测结果如下:

	\| Model \| Humanities \| Social Sciences \| STEM \| Other \| Average \|
	\|----------------------------------------\|-----------:\|:---------------:\|:----:\|:-----:\|:-------:\|
	\| LLaMA-7B<sup>2</sup> \| 34.0 \| 38.3 \| 30.5 \| 38.1 \| 35.1 \|
	\| Falcon-7B<sup>1</sup> \| - \| - \| - \| - \| 35.0 \|
	\| mpt-7B<sup>1</sup> \| - \| - \| - \| - \| 35.6 \|
	\| ChatGLM-6B<sup>0</sup> \| 35.4 \| 41.0 \| 31.3 \| 40.5 \| 36.9 \|
	\| BLOOM 7B<sup>0</sup> \| 25.0 \| 24.4 \| 26.5 \| 26.4 \| 25.5 \|
	\| BLOOMZ 7B<sup>0</sup> \| 31.3 \| 42.1 \| 34.4 \| 39.0 \| 36.1 \|
	\| moss-moon-003-base (16B)<sup>0</sup> \| 24.2 \| 22.8 \| 22.4 \| 24.4 \| 23.6 \|
	\| moss-moon-003-sft (16B)<sup>0</sup> \| 30.5 \| 33.8 \| 29.3 \| 34.4 \| 31.9 \|
	\| Baichuan-7B<sup>0</sup> \| 38.4 \| 48.9 \| 35.6 \| 48.1 \| 42.3 \|
	\| Baichuan-7B-Instruction(5-shot) \| 38.9 \| 49.0 \| 35.3 \| 48.8 \| 42.6 \|
	\| Baichuan-7B-Instruction(0-shot) \| 38.7 \| 47.9 \| 34.5 \| 48.2 \| 42.0 \|