pskun commited on
Commit
b8dd855
1 Parent(s): b9fb8f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -36
README.md CHANGED
@@ -7,14 +7,6 @@ library_name: transformers
7
  pipeline_tag: text-generation
8
  ---
9
 
10
- # 姜子牙系列模型
11
-
12
- - [Ziya-LLaMA-13B-v1.1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1.1)
13
- - [Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1)
14
- - [Ziya-LLaMA-7B-Reward](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-7B-Reward)
15
- - [Ziya-LLaMA-13B-Pretrain-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1)
16
- - [Ziya-BLIP2-14B-Visual-v1](https://huggingface.co/IDEA-CCNL/Ziya-BLIP2-14B-Visual-v1)
17
-
18
  ## 简介 Brief Introduction
19
 
20
  姜子牙写作大模型V2是基于LlaMa-2的130亿参数的指令微调模型,在写作任务上进行了能力增强,是专注于写作的大模型。姜子牙写作模型可以完成公文报告、讲稿书信、创意文案等多类的写作任务。
@@ -44,25 +36,27 @@ pip install torch==1.12.1 tokenizers==0.13.3 git+https://github.com/huggingface/
44
 
45
  最后,我们利用evol-instruct的方法,生成了约30万条高质量的通用指令数据。我们混合了通用指令数据和写作指令数据,这使得ziya-writing-v2不仅拥有良好的意图理解能力,也能够生成优秀的回答。
46
 
 
47
 
 
48
 
49
- ### 对齐学习 Alignment training
50
 
51
- 我们在实验中发现,利用少量人类标注的高质量的写作排序数据,使用强化学习训练模型,就能对进一步拔高模型的写作效果。
52
 
53
- 为了进一步提升模型的表现,使其能够充分理解人类意图、减少“幻觉”和不安全的输出,基于指令微调后的模型,进行了人类反馈训练(Human-Feedback Training,HFT)。在训练中,我们采用了以人类反馈强化学习(RM、PPO)为主。
54
 
55
- 我们在内部自研的框架上实现了HFT的训练流程,该框架可以利用最少8张40G的A100显卡完成Ziya-Writing-LLaMA-13B-v1的全参数训练。在PPO训练中,我们没有限制生成样本的长度,以确保长文本任务的奖励准确性。每次训练的总经验池尺寸超过100k样本,确保了训练的充分性。
56
 
57
- In our experiment, we found that by using a small amount of high-quality human-annotated writing ranking data and training the model with reinforcement learning, we could effectively improve the writing performance of the model.
58
 
59
- To further improve the performance of the model, enabling it to fully understand human intentions, reduce "hallucinations" and unsafe outputs, we conducted Human-Feedback Training (HFT) based on the model fine-tuned with instructions. In the training process, we used human feedback reinforcement learning (RM, PPO).
60
 
61
- We implemented the HFT training process on an internally developed framework, which can use a minimum of 8 40GB A100 GPUs to complete the full parameter training of Ziya-Writing-LLaMA-13B-v1. In the PPO training, we did not limit the length of the generated samples to ensure the accuracy of rewards for long-text tasks. The total experience pool size for each training exceeded 100k samples, ensuring the sufficiency of the training.
62
 
63
  ### 效果评估 Performance
64
 
65
- 写作文案的优劣评价是一个较为主观的评判,很难用一个准确率或者满意度的打分来衡量。因此,我们使用了匿名模型多人Side-by-Side评估的机制,收集了100条不同难度的写作指令数据进行评估,我们后续也会公开这个评测集。
66
 
67
  我们以胜出率作为评价模型好坏的指标,一个模型的胜出率计算公式为:
68
 
@@ -78,16 +72,10 @@ Win Rate = (Number of wins for the model + Number of draws / 2) / Total number o
78
 
79
  Generally, since most language models generate responses based on sampling, hence, a win rate greater than 55% indicates that the model significantly outperforms another model, a win rate less than 45% shows that the model clearly lags behind, and a win rate between 45% and 55% signifies that the two models are essentially on par.
80
 
81
- | Ziya-Writing-LLaMa-13B-v1 | 平均胜出率 | 最大胜出率 | 最小胜出率 |
82
- | :----: | :----: | :----: | :----: |
83
- | vs Ziya-LLaMa-13B-v1.1 | 70.7 | 73.5 | 69 |
84
- | vs baichuan-vicuna-7b | 69.6 | 73.5 | 68 |
85
- | vs Moss-16B | 65.1 | 69 | 62 |
86
- | vs ChatGLM2-6B | 58.3 | 61.5 | 56 |
87
- | vs Minimax-abab5 | 52.3 | 53 | 50.5 |
88
- | vs GPT-3.5-turbo | 44.7 | 49.5 | 38 |
89
 
90
- (注:最大胜出率和最小胜出率,是对每一个标注人员的标注结果进行单独统计,计算出最大和最小的得分;平均胜出率是对所有标注人员的标注结果进行汇总统计,计算出平均的得分。)
91
 
92
  ## <span id="jump"> 使用 Usage </span>
93
 
@@ -101,14 +89,14 @@ import torch
101
  device = torch.device("cuda")
102
 
103
  query="帮我写一份去西安的旅游计划"
104
- model = LlamaForCausalLM.from_pretrained("IDEA-CCNL/Ziya-Writing-LLaMa-13B-v1", torch_dtype=torch.float16, device_map="auto")
105
- tokenizer = AutoTokenizer.from_pretrained("IDEA-CCNL/Ziya-Writing-LLaMa-13B-v1", use_fast=False)
106
  inputs = '<human>:' + query.strip() + '\n<bot>:'
107
 
108
  input_ids = tokenizer(inputs, return_tensors="pt").input_ids.to(device)
109
  generate_ids = model.generate(
110
  input_ids,
111
- max_new_tokens=2048,
112
  do_sample = True,
113
  top_p = 0.85,
114
  temperature = 0.85,
@@ -121,14 +109,6 @@ print(output)
121
 
122
  ```
123
 
124
- ## 微调示例 Finetune Example
125
-
126
- Refer to [ziya_finetune](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/ziya_llama)
127
-
128
- ## 推理量化示例 Inference & Quantization Example
129
-
130
- Refer to [ziya_inference](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/ziya_inference)
131
-
132
  ## 引用 Citation
133
 
134
  如果您在您的工作中使用了我们的模型,可以引用我们的[论文](https://arxiv.org/abs/2210.08590):
 
7
  pipeline_tag: text-generation
8
  ---
9
 
 
 
 
 
 
 
 
 
10
  ## 简介 Brief Introduction
11
 
12
  姜子牙写作大模型V2是基于LlaMa-2的130亿参数的指令微调模型,在写作任务上进行了能力增强,是专注于写作的大模型。姜子牙写作模型可以完成公文报告、讲稿书信、创意文案等多类的写作任务。
 
36
 
37
  最后,我们利用evol-instruct的方法,生成了约30万条高质量的通用指令数据。我们混合了通用指令数据和写作指令数据,这使得ziya-writing-v2不仅拥有良好的意图理解能力,也能够生成优秀的回答。
38
 
39
+ We have collected and cleaned a large amount of authentic human writing data from the internet. Using GPT-3.5, we generated corresponding writing prompts and conducted rigorous manual verification.
40
 
41
+ Additionally, we trained an Answer-to-Instruction model to generate high-quality enhanced writing prompt data from unsupervised writing data, further improving the quality of our data.
42
 
43
+ Based on this, we carefully selected more challenging writing prompts using a reward model and specific cleaning logic, filtering out simple data and ensuring prompt diversity.
44
 
45
+ Finally, using the evol-instruct method, we generated approximately 300,000 high-quality general instruction data. By combining this with the writing prompt data, ziya-writing-v2 not only possesses strong intent understanding capabilities but also generates excellent responses.
46
 
47
+ ### 对齐学习 Alignment training
48
 
49
+ 我们使用GPT4、Minimax、Baichuan2、Qwen-14B等优秀的对话模型,对同一个指令生成不同的回答,我们利用奖励模型对不同的回答进行排序,形成偏好数据。
50
 
51
+ 我们使用了SFT-like Alignment的方法进行对齐训练,我们在内部自研的框架上实现了Alignment的训练流程,训练使用了8k的上下位窗口,一共约2万的偏好数据。
52
 
53
+ We use excellent LLMs such as GPT4, Minimax, Baichuan2, Qwen-14B, and generate different responses to the same instruction. We use a reward model to rank the different responses and form preference data.
54
 
55
+ We utilize the SFT-like Alignment method for training, implementing the alignment training process on our internally developed framework. The training uses an 8k context window, resulting in approximately 20,000 preference data points.
56
 
57
  ### 效果评估 Performance
58
 
59
+ 写作文案的优劣��价是一个较为主观的评判,很难用一个准确率或者满意度的打分来衡量。因此,我们使用了匿名模型多人Side-by-Side评估的机制,收集了170条不同难度的写作指令数据进行评估,我们后续也会公开这个评测集。
60
 
61
  我们以胜出率作为评价模型好坏的指标,一个模型的胜出率计算公式为:
62
 
 
72
 
73
  Generally, since most language models generate responses based on sampling, hence, a win rate greater than 55% indicates that the model significantly outperforms another model, a win rate less than 45% shows that the model clearly lags behind, and a win rate between 45% and 55% signifies that the two models are essentially on par.
74
 
75
+ | Ziya-Writing-13B-v2 | 胜出率 |
76
+ | :----: | :----: |
77
+ | vs Ziya-Writing-LLaMa-13B-v1 | 72.5 |
 
 
 
 
 
78
 
 
79
 
80
  ## <span id="jump"> 使用 Usage </span>
81
 
 
89
  device = torch.device("cuda")
90
 
91
  query="帮我写一份去西安的旅游计划"
92
+ model = LlamaForCausalLM.from_pretrained("IDEA-CCNL/Ziya-Writing-13B-v2", torch_dtype=torch.float16, device_map="auto")
93
+ tokenizer = AutoTokenizer.from_pretrained("IDEA-CCNL/Ziya-Writing-13B-v2", use_fast=False)
94
  inputs = '<human>:' + query.strip() + '\n<bot>:'
95
 
96
  input_ids = tokenizer(inputs, return_tensors="pt").input_ids.to(device)
97
  generate_ids = model.generate(
98
  input_ids,
99
+ max_new_tokens=4096,
100
  do_sample = True,
101
  top_p = 0.85,
102
  temperature = 0.85,
 
109
 
110
  ```
111
 
 
 
 
 
 
 
 
 
112
  ## 引用 Citation
113
 
114
  如果您在您的工作中使用了我们的模型,可以引用我们的[论文](https://arxiv.org/abs/2210.08590):