Yukang commited on
Commit
e582534
1 Parent(s): 400d445

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -9
README.md CHANGED
@@ -95,11 +95,11 @@ We did not use the `input` format in the Alpaca format for simplicity.
95
  ## Models
96
 
97
  ### Models with supervised fine-tuning
98
- | Model | Size | Context | Train | Link |
99
- |:---------------|------|---------|---------|-------------------------------------------------------------------------------------------------------------------------|
100
- | LongAlpaca-7B | 7B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-7B) |
101
- | LongAlpaca-13B | 13B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-13B) |
102
- | LongAlpaca-70B | 70B | 32768 | LoRA+ | [(Model)](https://huggingface.co/Yukang/LongAlpaca-70B-lora) |
103
 
104
 
105
  ### Models with context extension via fully fine-tuning
@@ -135,6 +135,9 @@ We use LLaMA2 models as the pre-trained weights and fine-tune them to long conte
135
  | [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) |
136
  |[Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) |
137
  | [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) |
 
 
 
138
 
139
  This project also supports GPTNeoX models as the base model architecture. Some candidate pre-trained weights may include [GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b), [Polyglot-ko-12.8B](https://huggingface.co/EleutherAI/polyglot-ko-12.8b) and other variants.
140
 
@@ -179,12 +182,12 @@ cd path_to_saving_checkpoints && python zero_to_fp32.py . pytorch_model.bin
179
  ### Supervised Fine-tuning
180
  ```
181
  torchrun --nproc_per_node=8 supervised-fine-tune.py \
182
- --model_name_or_path path_to_finetuned_models \
183
  --bf16 True \
184
  --output_dir path_to_saving_checkpoints \
185
  --model_max_length 32768 \
186
  --use_flash_attn True \
187
- --data_path LongQA.json \
188
  --low_rank_training True \
189
  --num_train_epochs 3 \
190
  --per_device_train_batch_size 1 \
@@ -202,8 +205,8 @@ torchrun --nproc_per_node=8 supervised-fine-tune.py \
202
  --deepspeed "ds_configs/stage2.json" \
203
  --tf32 True
204
  ```
205
- - We typically make supervised fine-tuning upon the fine-tuned context extended models, `path_to_finetuned_models`, like `Llama-2-13b-longlora-32k` or `Llama-2-13b-longlora-32k-ft`.
206
- - During our dataset collection, it is hard for us to collect many high-quality QA that are larger than 32768. Thus, if you use our `LongQA.json`, please also set `model_max_length` as 32768.
207
 
208
 
209
  ### Get trainable weights in low-rank training
 
95
  ## Models
96
 
97
  ### Models with supervised fine-tuning
98
+ | Model | Size | Context | Train | Link |
99
+ |:---------------|------|---------|---------|-----------------------------------------------------------------------------------------------------------------------|
100
+ | LongAlpaca-7B | 7B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-7B) |
101
+ | LongAlpaca-13B | 13B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-13B) |
102
+ | LongAlpaca-70B | 70B | 32768 | LoRA+ | [Model](https://huggingface.co/Yukang/LongAlpaca-70B-lora) |
103
 
104
 
105
  ### Models with context extension via fully fine-tuning
 
135
  | [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) |
136
  |[Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) |
137
  | [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) |
138
+ | [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) |
139
+ | [Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) |
140
+ | [Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) |
141
 
142
  This project also supports GPTNeoX models as the base model architecture. Some candidate pre-trained weights may include [GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b), [Polyglot-ko-12.8B](https://huggingface.co/EleutherAI/polyglot-ko-12.8b) and other variants.
143
 
 
182
  ### Supervised Fine-tuning
183
  ```
184
  torchrun --nproc_per_node=8 supervised-fine-tune.py \
185
+ --model_name_or_path path_to_Llama2_chat_models \
186
  --bf16 True \
187
  --output_dir path_to_saving_checkpoints \
188
  --model_max_length 32768 \
189
  --use_flash_attn True \
190
+ --data_path LongAlpaca-12k.json \
191
  --low_rank_training True \
192
  --num_train_epochs 3 \
193
  --per_device_train_batch_size 1 \
 
205
  --deepspeed "ds_configs/stage2.json" \
206
  --tf32 True
207
  ```
208
+ - There is no need to make supervised fine-tuning upon the fine-tuned context extended models. It is all right to directly use base model as Llama2-chat models, as the amount of long instruction following data is enough for SFT.
209
+ - Our long instruction following data can be found in [LongAlpaca-12k.json](https://huggingface.co/datasets/Yukang/LongAlpaca-12k).
210
 
211
 
212
  ### Get trainable weights in low-rank training