IDEA-CCNL
/

Ziya-LLaMA-13B-v1

@@ -20,8 +20,14 @@ language:
 姜子牙通用大模型V1是基于LLaMa的130亿参数的大规模预训练模型，具备翻译，编程，文本分类，信息抽取，摘要，文案生成，常识问答和数学计算等能力。目前姜子牙通用大模型已完成大规模预训练、多任务有监督微调和人类反馈学习三阶段的训练过程。
 The Ziya-LLaMA-13B-v1 is a large-scale pre-trained model based on LLaMA with 13 billion parameters. It has the ability to perform tasks such as translation, programming, text classification, information extraction, summarization, copywriting, common sense Q&A, and mathematical calculation. The Ziya-LLaMA-13B-v1 has undergone three stages of training: large-scale continual pre-training (PT), multi-task supervised fine-tuning (SFT), and human feedback learning (RM, PPO).
 ## 模型分类 Model Taxonomy
 |  需求 Demand  | 任务 Task       | 系列 Series      | 模型 Model    | 参数 Parameter | 额外 Extra |
@@ -81,17 +87,31 @@ We implemented the HFT training process on an internally developed framework, wh
 ## 使用 Usage
 ```python3
 from transformers import AutoTokenizer
 from transformers import LlamaForCausalLM
 import torch
 device = torch.device("cuda")
 query="帮我写一份去西安的旅游计划"
-model = LlamaForCausalLM.from_pretrained('IDEA-CCNL/Ziya-LLaMA-13B-v1', torch_dtype=torch.float16, device_map="auto")
-tokenizer = AutoTokenizer.from_pretrained('IDEA-CCNL/Ziya-LLaMA-13B-v1', use_fast=False)
 inputs = '<human>:' + query.strip() + '\n<bot>:'
 input_ids = tokenizer(inputs, return_tensors="pt").input_ids.to(device)
@@ -108,6 +128,47 @@ generate_ids = model.generate(
 output = tokenizer.batch_decode(generate_ids)[0]
 print(output)
 ```
@@ -137,4 +198,4 @@ You can also cite our [website](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
   year={2021},
   howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
 }
-```

 姜子牙通用大模型V1是基于LLaMa的130亿参数的大规模预训练模型，具备翻译，编程，文本分类，信息抽取，摘要，文案生成，常识问答和数学计算等能力。目前姜子牙通用大模型已完成大规模预训练、多任务有监督微调和人类反馈学习三阶段的训练过程。
 The Ziya-LLaMA-13B-v1 is a large-scale pre-trained model based on LLaMA with 13 billion parameters. It has the ability to perform tasks such as translation, programming, text classification, information extraction, summarization, copywriting, common sense Q&A, and mathematical calculation. The Ziya-LLaMA-13B-v1 has undergone three stages of training: large-scale continual pre-training (PT), multi-task supervised fine-tuning (SFT), and human feedback learning (RM, PPO).
+## 软件依赖
+```
+pip install torch==1.12.1 tokenizers==0.13.3 git+https://github.com/huggingface/transformers
+```
 ## 模型分类 Model Taxonomy
 |  需求 Demand  | 任务 Task       | 系列 Series      | 模型 Model    | 参数 Parameter | 额外 Extra |
 ## 使用 Usage
+由于LLaMA权重的许可限制，该模型不能用于商业用途，请严格遵守LLaMA的使用政策。考虑到LLaMA权重的许可证限制，我们无法直接发布完整的模型权重。因此，我们使用了[FastChat开源工具](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/apply_delta.py)作为基础，并对其进行了进一步的优化。我们计算并发布了Ziya-LLaMA-13B-v1权重与原始LLaMA权重之间的差值。用户可以按照以下步骤操作以获得Ziya-LLaMA-13B-v1完整权重，具体步骤如下：
+Step 1:获取[LLaMA](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform)权重并转成Hugging Face Transformers模型格式，可参考转换[脚本](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py)（若已经有huggingface权重则跳过）
+```
+python src/transformers/models/llama/convert_llama_weights_to_hf.py \
+    --input_dir /path/to/downloaded/llama/weights --model_size 13B --output_dir /output/path
+```
+Step 2:下载Ziya-LLaMA-13B-v1的delta权重以及step 1中转换好的原始LLaMA权重，使用如下脚本转换：https://github.com/IDEA-CCNL/Fengshenbang-LM/blob/main/fengshen/utils/apply_delta.py
+```
+python3 -m apply_delta --base ~/model_weights/llama-13b --target ~/model_weights/Ziya-LLaMA-13B --delta ~/model_weights/Ziya-LLaMA-13B-v1
+```
+Step 3: 加载step 2得到的模型推理
 ```python3
 from transformers import AutoTokenizer
 from transformers import LlamaForCausalLM
 import torch
 device = torch.device("cuda")
+ckpt = '基于delta参数合并后的完整模型权重'
 query="帮我写一份去西安的旅游计划"
+model = LlamaForCausalLM.from_pretrained(ckpt, torch_dtype=torch.float16, device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained(ckpt, use_fast=False)
 inputs = '<human>:' + query.strip() + '\n<bot>:'
 input_ids = tokenizer(inputs, return_tensors="pt").input_ids.to(device)
 output = tokenizer.batch_decode(generate_ids)[0]
 print(output)
+```
+NOTE: Due to the licensing restrictions of LLaMA weights, the utilization of the model for commercial purposes is precluded. Please strictly respect LLaMA's usage policy. Considering the licensing limitations on LLaMA weights, we are unable to directly release the complete model weights. Therefore, we utilized [the open-source FastChat tool](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/apply_delta.py) and further optimized it to calculate the differences between Ziya-LLaMA-13B-v1 weights and the original LLaMA weights. Users can follow the steps to obtain the complete weights of Ziya-LLaMA-13B-v1. The steps are as follows:
+Step 1: Obtain the [LLaMA](https://huggingface.co/docs/transformers/main/en/model_doc/llama#overview) weights and convert them into the Hugging Face Transformers format. You can refer to the [script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py) (skip this step if you already have the Hugging Face weights).
+```
+python src/transformers/models/llama/convert_llama_weights_to_hf.py \
+    --input_dir /path/to/downloaded/llama/weights --model_size 13B --output_dir /output/path
+```
+Step 2: Download the delta weights for Ziya-LLaMA-13B-v1 and the pre-converted original LLaMA weights from step 1. Use the following script for conversion: https://github.com/IDEA-CCNL/Fengshenbang-LM/blob/main/fengshen/utils/apply_delta.py
+```
+python3 -m apply_delta --base ~/model_weights/llama-13b --target ~/model_weights/Ziya-LLaMA-13B --delta ~/model_weights/Ziya-LLaMA-13B-v1(huggingface下载)
+```
+Step 3: Load the model obtained in Step 2 for inference.
+```python3
+from transformers import AutoTokenizer
+from transformers import LlamaForCausalLM
+import torch
+device = torch.device("cuda")
+ckpt = '基于delta合并后完整模型权重'
+query="帮我写一份去西安的旅游计划"
+model = LlamaForCausalLM.from_pretrained(ckpt, torch_dtype=torch.float16, device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained(ckpt, use_fast=False)
+inputs = '<human>:' + query.strip() + '\n<bot>:'
+input_ids = tokenizer(inputs, return_tensors="pt").input_ids.to(device)
+generate_ids = model.generate(
+            input_ids,
+            max_new_tokens=1024,
+            do_sample = True,
+            top_p = 0.85,
+            temperature = 1.0,
+            repetition_penalty=1.,
+            eos_token_id=2,
+            bos_token_id=1,
+            pad_token_id=0)
+output = tokenizer.batch_decode(generate_ids)[0]
+print(output)
 ```
   year={2021},
   howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
 }
+```