YuxinJiang commited on
Commit
c94d951
1 Parent(s): 619b7fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +264 -1
README.md CHANGED
@@ -2,4 +2,267 @@
2
  license: mit
3
  ---
4
 
5
- The model weights will come soon!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  ---
4
 
5
+ # To comply with the LLaMA model license, we release Lion weights as _delta weights_.
6
+
7
+ # Lion: Adversarial Distillation of Closed-Source Large Language Model
8
+
9
+ <p align="center" width="100%">
10
+ <a ><img src="pics/Lion.jpg" alt="Lion" style="width: 20%; min-width: 200px; display: block; margin: auto;"></a>
11
+ </p>
12
+ <p align="center">
13
+ <a href="https://arxiv.org/abs/2305.12870">[📄 Paper]</a> |
14
+ <a href="https://huggingface.co/YuxinJiang/Lion">[🤗 Lion Weights]</a> |
15
+ <a href="https://84bc5e1fdfbb976d51.gradio.live/">[:desktop_computer: Demo]</a>
16
+ </p>
17
+ <hr>
18
+
19
+
20
+ [![Code License](https://img.shields.io/badge/code%20license-MIT-green)](https://github.com/YJiangcm/Lion/blob/master/LICENSE)
21
+ [![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/DATA_LICENSE)
22
+ [![Weight Diff License](https://img.shields.io/badge/Weight%20Diff%20License-CC%20By%20NC%204.0-yellow)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/WEIGHT_DIFF_LICENSE)
23
+ [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)
24
+
25
+ <!-- The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes. -->
26
+ <!-- The weight diff is also CC BY NC 4.0 (allowing only non-commercial use). -->
27
+
28
+ ### Tuned on 70k instruction-following data, Lion (7B) can achieve 95% capability of ChatGPT!
29
+
30
+
31
+ ## News
32
+ - **[May 26, 2023]** We released the model weights. Check out the [7B](https://huggingface.co/YuxinJiang/Lion) model!
33
+ - **[May 25, 2023]** We released an [online demo](https://84bc5e1fdfbb976d51.gradio.live/), try our model here!
34
+ - **[May 23, 2023]** We released the code for training and inference.
35
+
36
+ <!-- :pray: Since our team members are perparing for the PhD Qualifying Exam, we apologize for any possible delay in responding to your questions. We warmly welcome all inquiries and appreciate your constructive feedback :) -->
37
+
38
+ ## Contents
39
+
40
+ 1. [Overview](#overview)
41
+
42
+ 2. [Online Demo](#online-demo)
43
+
44
+ 3. [Recovering Lion weights](#recovering-lion-weights)
45
+
46
+ 4. [Inference](#inference)
47
+
48
+ 5. [Training Process](#training-process)
49
+
50
+ 6. [Evaluation](#evaluation)
51
+
52
+ 7. [Citation](#citation)
53
+
54
+ 8. [Disclaimer](#disclaimer)
55
+
56
+
57
+ ## Overview
58
+ <p align="center">
59
+ <img width="700" height="320" src="https://github.com/YJiangcm/Lion/blob/master/pics/overview.jpg">
60
+ </p>
61
+
62
+ The high-level overview of our adversarial distillation framework, where we craft a compact Student LLM based on a superior closed-source LLM that serves three roles: the **Teacher**, the **Referee**, and the **Generator**. From left to right, there are three stages in an iteration:
63
+ 1) an _imitation_ stage to align the student’s response with the teacher’s response;
64
+ 2) a _discrimination_ stage to identify hard samples;
65
+ 3) a _generation_ stage to produce new hard samples for escalating the challenges presented to the student model.
66
+
67
+
68
+ ## Online Demo
69
+ We will provide our latest models for you to try for as long as possible. You may ask some questions to Lion and we are happy to hear your feedback!
70
+
71
+ [**Demo Link**](https://84bc5e1fdfbb976d51.gradio.live/) (the UI interface is shown below)
72
+
73
+ <p align="center">
74
+ <img width="800" height="350" src="https://github.com/YJiangcm/Lion/blob/master/pics/english_case2.png">
75
+ </p>
76
+
77
+ Since the training data are English instruction-following examples, You'd better ask questions in English. However, we found Lion can also understand instructions in other languages to some extent. See the following case:
78
+
79
+ <p align="center">
80
+ <img width="800" height="350" src="https://github.com/YJiangcm/Lion/blob/master/pics/chinese_case.png">
81
+ </p>
82
+
83
+
84
+ ## Recovering Lion weights
85
+ We release Lion weights as delta weights to comply with the LLaMA model license.
86
+
87
+ - [Lion-7B (delta weights)](https://huggingface.co/YuxinJiang/Lion)
88
+
89
+ You can add our delta to the original LLaMA weights to obtain the Lion weights. Instructions:
90
+ 1. Get the original LLaMA weights in the huggingface format by following the instructions [here](https://huggingface.co/docs/transformers/main/model_doc/llama)
91
+ 2. Please download our delta model from [Hugging Face](https://huggingface.co/YuxinJiang/Lion)
92
+ 3. Use the following scripts to get Lion weights by applying our delta:
93
+ ```bash
94
+ python src/weight_diff.py recover --path_raw huggyllama/llama-7b --path_diff YuxinJiang/Lion --path_tuned <path_to_store_recovered_weights>
95
+ ```
96
+
97
+ ## Inference
98
+ For inference and training of Lion, please first install the requirements:
99
+ ```bash
100
+ pip install -r requirements.txt
101
+ ```
102
+
103
+ We provide the decoding script for Lion, which reads a input file and generates corresponding responses for each sample, and finally consolidates them into an output file.
104
+ ```bash
105
+ python src/lion_inference.py \
106
+ --model_dir <path_to_hf_converted_lion_ckpt_and_tokenizer> \
107
+ --data_dir <path_to_input_json_file> \
108
+ --output_dir <path_to_output_json_file> \
109
+ --num_gpus 8
110
+ ```
111
+
112
+
113
+ ## Training Process
114
+ Below shows one iteration of our adversarial distillation framework.
115
+ ### 1. Imitation Stage
116
+ #### 1.1 Acquire the teacher's response on the Train Pool
117
+
118
+ ```bash
119
+ python src/chatgpt_inference.py \
120
+ -q <path_to_json_file_for_the_Train_Pool> \
121
+ -o <path_to_chatgpt_inference_for_the_Train_Pool> \
122
+ --api_key <your_openai_api_key>
123
+ ```
124
+
125
+ #### 1.2 Instruction-tuning the student based on the teacher’s response on the Train Pool
126
+
127
+ Fine-tuning was conducted on on a machine with 8 A100 80G GPUs.
128
+
129
+ ```bash
130
+ torchrun --nproc_per_node=8 --master_port=<your_random_port> src/train.py \
131
+ --model_name_or_path <path_to_hf_converted_ckpt_and_tokenizer> \
132
+ --data_path <path_to_chatgpt_inference_for_the_Train_Pool> \
133
+ --bf16 True \
134
+ --output_dir result \
135
+ --num_train_epochs 3 \
136
+ --model_max_length 1024 \
137
+ --per_device_train_batch_size 1 \
138
+ --per_device_eval_batch_size 1 \
139
+ --gradient_accumulation_steps 8 \
140
+ --evaluation_strategy "no" \
141
+ --save_strategy "steps" \
142
+ --save_steps 500 \
143
+ --save_total_limit 1 \
144
+ --learning_rate 2e-5 \
145
+ --weight_decay 0. \
146
+ --warmup_ratio 0.03 \
147
+ --lr_scheduler_type "cosine" \
148
+ --logging_steps 1 \
149
+ --fsdp "full_shard auto_wrap" \
150
+ --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
151
+ --tf32 True
152
+ ```
153
+
154
+ ### 2. Discrimination Stage
155
+ #### 2.1 Acquire the teacher's response on the Cache Pool
156
+
157
+ ```bash
158
+ python src/chatgpt_inference.py \
159
+ -q <path_to_json_file_for_the_Cache_Pool> \
160
+ -o <path_to_chatgpt_inference_for_the_Cache_Pool> \
161
+ --api_key <your_openai_api_key>
162
+ ```
163
+
164
+ #### 2.2 Acquire the student's response on the Cache Pool
165
+
166
+ ```bash
167
+ python src/lion_inference.py \
168
+ --model_dir <path_to_hf_converted_lion_ckpt_and_tokenizer> \
169
+ --data_dir <path_to_json_file_for_the_Cache_Pool> \
170
+ --output_dir <path_to_lion_inference_for_the_Cache_Pool> \
171
+ --num_gpus 8
172
+ ```
173
+
174
+ #### 2.3 Ask the referee to output two scores according to the respose quality of the teacher and the student
175
+
176
+ ```bash
177
+ python src/chatgpt_referee.py \
178
+ -a <path_to_chatgpt_inference_for_the_Cache_Pool> <path_to_lion_inference_for_the_Cache_Pool> \
179
+ -o <path_to_output_review_file> \
180
+ --api_key <your_openai_api_key>
181
+ ```
182
+
183
+ #### 2.4 Discriminate hard instructions and easy instructions
184
+
185
+ ```bash
186
+ python src/discrimination.py \
187
+ --review_path <path_to_output_review_file> \
188
+ --chatgpt_inference_path <path_to_chatgpt_inference_for_the_Cache_Pool> \
189
+ --lion_inference_path path_to_lion_inference_for_the_Cache_Pool \
190
+ --hard_save_path <path_to_identified_hard_instructions> \
191
+ --easy_save_path <path_to_identified_easy_instructions>
192
+ ```
193
+
194
+ ### 3. Generation Stage
195
+ Fill the `openai.api_key = "<you_openai_api_key>"` in [src/utils.py](https://github.com/YJiangcm/Lion/blob/master/src/utils.py).
196
+ #### 3.1 Generate new hard instructions
197
+
198
+ ```bash
199
+ python -m src/generate_hard_instruction generate_instruction_following_data \
200
+ --seed_tasks_path <path_to_identified_hard_instructions> \
201
+ --output_dir <path_to_generated_hard_instructions> \
202
+ --num_instructions_to_generate 3000
203
+ ```
204
+ #### 3.2 Generate new easy instructions
205
+ ```bash
206
+ python -m src/generate_easy_instruction generate_instruction_following_data \
207
+ --seed_tasks_path <path_to_identified_easy_instructions> \
208
+ --output_dir <path_to_generated_easy_instructions> \
209
+ --num_instructions_to_generate 3000
210
+ ```
211
+
212
+ ## Evaluation
213
+
214
+ ### Automatic Evaluation with GPT-4
215
+ we leverage GPT-4 to automatically rate the response quality (with scores from 1 to 10) between two models on 80 unseen [Vicuna-Instructions](https://github.com/lm-sys/FastChat/blob/main/fastchat/eval/table/question.jsonl).
216
+ ChatGPT has been chosen as the reference model to estimate the relative capability of diverse LLMs against it. The relative score is reported in percentage, computed as the ratio of the sum of scores.
217
+
218
+ **Relative Overall Response Quality**:
219
+
220
+ <p align="center">
221
+ <img width="500" height="250" src="https://github.com/YJiangcm/Lion/blob/master/pics/relative_quality_overall.jpg">
222
+ </p>
223
+
224
+ **Relative Response Quality of Diverse Task Categories**:
225
+
226
+ <p align="center">
227
+ <img width="700" height="330" src="https://github.com/YJiangcm/Lion/blob/master/pics/relative_quality_category.jpg">
228
+ </p>
229
+
230
+ ### Human Evaluation with Alignment Criteria
231
+ We employ the alignment criteria proposed by Askell et al. (2021), which define that an assistant is considered aligned if it is characterized by being helpful, honest, and
232
+ harmless (HHH). We performed a human evaluation on 252 [UserOriented-Instructions](https://github.com/yizhongw/self-instruct/blob/main/human_eval/user_oriented_instructions.jsonl). To estimate the won rate, we compare the frequency of won, tie, and lost between each pair
233
+ of models below.
234
+
235
+ <p align="center">
236
+ <img width="500" height="300" src="https://github.com/YJiangcm/Lion/blob/master/pics/252task_win.jpg">
237
+ </p>
238
+
239
+
240
+ ## Citation
241
+ Please cite our paper if you use the code in this repo.
242
+
243
+ ```
244
+ @article{DBLP:journals/corr/abs-2305-12870,
245
+ author = {Yuxin Jiang and
246
+ Chunkit Chan and
247
+ Mingyang Chen and
248
+ Wei Wang},
249
+ title = {Lion: Adversarial Distillation of Closed-Source Large Language Model},
250
+ journal = {CoRR},
251
+ volume = {abs/2305.12870},
252
+ year = {2023},
253
+ url = {https://doi.org/10.48550/arXiv.2305.12870},
254
+ doi = {10.48550/arXiv.2305.12870},
255
+ eprinttype = {arXiv},
256
+ eprint = {2305.12870},
257
+ biburl = {https://dblp.org/rec/journals/corr/abs-2305-12870.bib},
258
+ bibsource = {dblp computer science bibliography, https://dblp.org}
259
+ }
260
+ ```
261
+
262
+
263
+
264
+
265
+ ## Disclaimer
266
+ ⚠️ Lion is intended and licensed for **research use ONLY**. Commercial use is **strictly prohibited**.
267
+ The content produced by any version of Lion is influenced by uncontrollable variables such as randomness, and therefore, the accuracy of the output cannot be guaranteed by this project.
268
+ This project does not accept any legal liability for the content of the model output, nor does it assume responsibility for any losses incurred due to the use of associated resources and output results.