--- library_name: transformers datasets: - shareAI/ShareGPT-Chinese-English-90k - FreedomIntelligence/ShareGPT-CN language: - zh pipeline_tag: question-answering tags: - chat - llm - llama2 - chatgpt --- - Github:https://github.com/CrazyBoyM/llama2-Chinese-chat 更新: - 2023-7-19 首个llama2 13b中文对话版本放出。 - 2023-07-23 完成第2个epoch训练放出,测试有更好的对话体验 - 2023-08-03 分支版本:bimoGPT放出,拥有自我身份认知、不错的代码问答能力,下载地址:https://huggingface.co/shareAI/bimoGPT-llama2-13b - 2023-08-21 更新世界模型排名榜,超越某号称“中文Llama2官方”社区的收费模型十多个名次。 完整合并后文件下载:https://www.codewithgpu.com/m/file/llama2-13b-Chinese-chat - 训练用数据集:https://huggingface.co/datasets/shareAI/ShareGPT-Chinese-English-90k - llama2训练交流QQ群:443064756 项目在中文sharegpt数据集上训练得到的llama2 Chinese chat 13b,为减轻文件大小负担这里只放出了adapter的权重 请拉取https://huggingface.co/TheBloke/Llama-2-13B-fp16 作为基础权重,使用如下脚步执行合并得到可工作的总权重: ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name_or_path = '/data/TheBloke/Llama-2-13B-fp16' adapter_name_or_path = '/data/llama2-13b-Chinese-chat' save_path = '/data/llama2-13b-Chinese-chat_v1' tokenizer = AutoTokenizer.from_pretrained( model_name_or_path, trust_remote_code=True ) model = AutoModelForCausalLM.from_pretrained( model_name_or_path, trust_remote_code=True, low_cpu_mem_usage=True, torch_dtype=torch.float16, device_map='auto' ) print("load model success") model = PeftModel.from_pretrained(model, adapter_name_or_path) print("load adapter success") model = model.merge_and_unload() print("merge success") tokenizer.save_pretrained(save_path) model.save_pretrained(save_path) print("save done.") ``` 合并后,体验对话: ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch def main(): model_name = '/data/llama2-13b-Chinese-chat_v1' device = 'cuda' max_new_tokens = 500 # 每轮对话最多生成多少个token history_max_len = 2000 # 模型记忆的最大token长度 top_p = 0.9 temperature = 0.35 # 越大模型越浪 repetition_penalty = 1.2 # 如果模型出现重复说话可以调节该系数 # 加载模型 model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True, low_cpu_mem_usage=True, torch_dtype=torch.float16, device_map='auto' ).to(device).eval() tokenizer = AutoTokenizer.from_pretrained( model_name, trust_remote_code=True, # llama不支持fast use_fast=False if model.config.model_type == 'llama' else True ) # 记录所有历史记录 history_token_ids = tokenizer('', return_tensors="pt").input_ids # 开始对话 user_input = input('User:') while True: user_input = '{}'.format(user_input) user_input_ids = tokenizer(user_input, return_tensors="pt", add_special_tokens=False).input_ids history_token_ids = torch.concat((history_token_ids, user_input_ids), dim=1) model_input_ids = history_token_ids[:, -history_max_len:].to(device) with torch.no_grad(): outputs = model.generate( input_ids=model_input_ids, max_new_tokens=max_new_tokens, do_sample=True, top_p=top_p, temperature=temperature, repetition_penalty=repetition_penalty, eos_token_id=tokenizer.eos_token_id ) model_input_ids_len = model_input_ids.size(1) response_ids = outputs[:, model_input_ids_len:] history_token_ids = torch.concat((history_token_ids, response_ids.cpu()), dim=1) response = tokenizer.batch_decode(response_ids) print("Bot:" + response[0].strip().replace('', "")) user_input = input('User:') if __name__ == '__main__': main() ``` 推荐继续二次训练以针对性调优对话效果~ ## Training procedure The following `bitsandbytes` quantization config was used during training: - load_in_8bit: False - load_in_4bit: True - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: nf4 - bnb_4bit_use_double_quant: True - bnb_4bit_compute_dtype: float16 ### Framework versions - PEFT 0.4.0.dev0 训练1个epoch,loss 0.9,实测用中文对话体验优于baichuan13b(仅主观感受)。还有很大潜力,建议作为底座把文件拉回去继续调优。 感谢: - LLaMA2 - Firefly项目 - shareGPT中文数据集的建设者们