Ziya-Writing-13B-v2 / README.md
pskun's picture
Update README.md
9ccff35
|
raw
history blame
9.19 kB
metadata
license: gpl-3.0
language:
  - zh
  - en
library_name: transformers
pipeline_tag: text-generation

姜子牙系列模型

简介 Brief Introduction

姜子牙写作大模型V2是基于LlaMa-2的130亿参数的指令微调模型,在写作任务上进行了能力增强,是专注于写作的大模型。姜子牙写作模型可以完成公文报告、讲稿书信、创意文案等多类的写作任务。

Ziya-Writing-13B-v2 is a 13-billion parameter instruction fine-tuned model based on LlaMa-2, which has been enhanced for better performance in writing tasks. It is a large model that focuses on writing. Ziya-Writing-LLaMa-13B-v1 can handle several types of writing tasks, including official reports, speeches, creative copywriting, and more.

软件依赖

pip install torch==1.12.1 tokenizers==0.13.3 git+https://github.com/huggingface/transformers

模型分类 Model Taxonomy

需求 Demand 任务 Task 系列 Series 模型 Model 参数 Parameter 额外 Extra
写作 Writing AGI模型 姜子牙 Ziya LLaMA2 13B English&Chinese

模型信息 Model Information

有监督微调 Supervised finetuning

我们从网络中收集并清洗了大量真实的真人写作数据,利用GPT-3.5生成对应的写作指令,并进行了极为严格的人工校验。

同时,我们训练了一个Answer-to-Instruction的模型,用于从无监督写作数据中,生成高质量的增强的写作指令数据,进一步提高了我们的数据质量。

在此基础上,我们利用奖励模型和一定的清洗逻辑,精心挑选了难度更高的写作指令,剔除了简单的数据,并保证了指令的多样性。

最后,我们利用evol-instruct的方法,生成了约30万条高质量的通用指令数据。我们混合了通用指令数据和写作指令数据,这使得ziya-writing-v2不仅拥有良好的意图理解能力,也能够生成优秀的回答。

对齐学习 Alignment training

我们在实验中发现,利用少量人类标注的高质量的写作排序数据,使用强化学习训练模型,就能对进一步拔高模型的写作效果。

为了进一步提升模型的表现,使其能够充分理解人类意图、减少“幻觉”和不安全的输出,基于指令微调后的模型,进行了人类反馈训练(Human-Feedback Training,HFT)。在训练中,我们采用了以人类反馈强化学习(RM、PPO)为主。

我们在内部自研的框架上实现了HFT的训练流程,该框架可以利用最少8张40G的A100显卡完成Ziya-Writing-LLaMA-13B-v1的全参数训练。在PPO训练中,我们没有限制生成样本的长度,以确保长文本任务的奖励准确性。每次训练的总经验池尺寸超过100k样本,确保了训练的充分性。

In our experiment, we found that by using a small amount of high-quality human-annotated writing ranking data and training the model with reinforcement learning, we could effectively improve the writing performance of the model.

To further improve the performance of the model, enabling it to fully understand human intentions, reduce "hallucinations" and unsafe outputs, we conducted Human-Feedback Training (HFT) based on the model fine-tuned with instructions. In the training process, we used human feedback reinforcement learning (RM, PPO).

We implemented the HFT training process on an internally developed framework, which can use a minimum of 8 40GB A100 GPUs to complete the full parameter training of Ziya-Writing-LLaMA-13B-v1. In the PPO training, we did not limit the length of the generated samples to ensure the accuracy of rewards for long-text tasks. The total experience pool size for each training exceeded 100k samples, ensuring the sufficiency of the training.

效果评估 Performance

写作文案的优劣评价是一个较为主观的评判,很难用一个准确率或者满意度的打分来衡量。因此,我们使用了匿名模型多人Side-by-Side评估的机制,收集了100条不同难度的写作指令数据进行评估,我们后续也会公开这个评测集。

我们以胜出率作为评价模型好坏的指标,一个模型的胜出率计算公式为:

胜出率=(该模型的胜出数量+打平数量/2)/总标注数

一般而言,由于语言模型大多基于采样来生成回答,因此胜出率大于55%表示该模型显著胜出于另外一个模型,胜出率小于45%表示该模型明显落后,胜出率在45%至55%之间表示两个模型基本持平。

The evaluation of the quality of a writing task is quite subjective, making it difficult to measure with precise accuracy or satisfaction score. Therefore, we've used an anonymous multi-person Side-by-Side evaluation mechanism, and have collected 100 pieces of writing instruction data of different difficulties for evaluation. We will also make this evaluation set public in the future.

We use the win rate as an indicator of the quality of a model. The formula to calculate a model's win rate is as follows:

Win Rate = (Number of wins for the model + Number of draws / 2) / Total number of annotations

Generally, since most language models generate responses based on sampling, hence, a win rate greater than 55% indicates that the model significantly outperforms another model, a win rate less than 45% shows that the model clearly lags behind, and a win rate between 45% and 55% signifies that the two models are essentially on par.

Ziya-Writing-LLaMa-13B-v1 平均胜出率 最大胜出率 最小胜出率
vs Ziya-LLaMa-13B-v1.1 70.7 73.5 69
vs baichuan-vicuna-7b 69.6 73.5 68
vs Moss-16B 65.1 69 62
vs ChatGLM2-6B 58.3 61.5 56
vs Minimax-abab5 52.3 53 50.5
vs GPT-3.5-turbo 44.7 49.5 38

(注:最大胜出率和最小胜出率,是对每一个标注人员的标注结果进行单独统计,计算出最大和最小的得分;平均胜出率是对所有标注人员的标注结果进行汇总统计,计算出平均的得分。)

使用 Usage

由于LLaMA权重的许可限制,该模型不能用于商业用途,请严格遵守LLaMA的使用政策。

from transformers import AutoTokenizer
from transformers import LlamaForCausalLM
import torch

device = torch.device("cuda")

query="帮我写一份去西安的旅游计划"
model = LlamaForCausalLM.from_pretrained("IDEA-CCNL/Ziya-Writing-LLaMa-13B-v1", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("IDEA-CCNL/Ziya-Writing-LLaMa-13B-v1", use_fast=False)
inputs = '<human>:' + query.strip() + '\n<bot>:'
      
input_ids = tokenizer(inputs, return_tensors="pt").input_ids.to(device)
generate_ids = model.generate(
            input_ids,
            max_new_tokens=2048, 
            do_sample = True, 
            top_p = 0.85, 
            temperature = 0.85, 
            repetition_penalty=1., 
            eos_token_id=2, 
            bos_token_id=1, 
            pad_token_id=0)
output = tokenizer.batch_decode(generate_ids)[0]
print(output)

微调示例 Finetune Example

Refer to ziya_finetune

推理量化示例 Inference & Quantization Example

Refer to ziya_inference

引用 Citation

如果您在您的工作中使用了我们的模型,可以引用我们的论文

If you are using the resource for your work, please cite the our paper:

@article{fengshenbang,
  author    = {Jiaxing Zhang and Ruyi Gan and Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen},
  title     = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
  journal   = {CoRR},
  volume    = {abs/2209.02970},
  year      = {2022}
}

You can also cite our website:

欢迎引用我们的网站:

@misc{Fengshenbang-LM,
  title={Fengshenbang-LM},
  author={IDEA-CCNL},
  year={2021},
  howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}