andreaskoepf commited on
Commit
cecdd87
1 Parent(s): 94537a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -2
README.md CHANGED
@@ -4,8 +4,6 @@ license: other
4
 
5
  # OpenAssistant LLaMa 30B SFT 6
6
 
7
- - **Paper:** https://arxiv.org/abs/2304.07327
8
-
9
  Due to the license attached to LLaMa models by Meta AI it is not possible to directly distribute LLaMa-based models. Instead we provide XOR weights for the OA models.
10
 
11
  Thanks to Mick for writing the `xor_codec.py` script which enables this process
@@ -140,3 +138,50 @@ ae48c4c68e4e171d502dd0896aa19a84 ./pytorch_model-00002-of-00007.bin
140
  ```
141
 
142
  If so you have successfully decoded the weights and should be able to use the model with HuggingFace Transformers. **If your checksums do not match those above, there is a problem.**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  # OpenAssistant LLaMa 30B SFT 6
6
 
 
 
7
  Due to the license attached to LLaMa models by Meta AI it is not possible to directly distribute LLaMa-based models. Instead we provide XOR weights for the OA models.
8
 
9
  Thanks to Mick for writing the `xor_codec.py` script which enables this process
 
138
  ```
139
 
140
  If so you have successfully decoded the weights and should be able to use the model with HuggingFace Transformers. **If your checksums do not match those above, there is a problem.**
141
+
142
+ ### Configuration
143
+
144
+ ```
145
+ llama-30b-sft-6:
146
+ dtype: fp16
147
+ log_dir: "llama_log_30b"
148
+ learning_rate: 1e-5
149
+ model_name: /home/ubuntu/Open-Assistant/model/model_training/.saved/llama-30b-super-pretrain/checkpoint-3500
150
+ output_dir: llama_model_30b
151
+ deepspeed_config: configs/zero3_config_sft.json
152
+ weight_decay: 0.0
153
+ residual_dropout: 0.0
154
+ max_length: 2048
155
+ use_flash_attention: true
156
+ warmup_steps: 20
157
+ gradient_checkpointing: true
158
+ gradient_accumulation_steps: 16
159
+ per_device_train_batch_size: 2
160
+ per_device_eval_batch_size: 3
161
+ eval_steps: 101
162
+ save_steps: 292
163
+ num_train_epochs: 8
164
+ save_total_limit: 3
165
+ use_custom_sampler: true
166
+ sort_by_length: false
167
+ save_strategy: steps
168
+ datasets:
169
+ - oasst_export:
170
+ lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk"
171
+ input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz
172
+ val_split: 0.05
173
+ - vicuna:
174
+ val_split: 0.05
175
+ max_val_set: 800
176
+ fraction: 0.8
177
+ - dolly15k:
178
+ val_split: 0.05
179
+ max_val_set: 300
180
+ - grade_school_math_instructions:
181
+ val_split: 0.05
182
+ - code_alpaca:
183
+ val_split: 0.05
184
+ max_val_set: 250
185
+ ```
186
+
187
+ - **OASST dataset paper:** https://arxiv.org/abs/2304.07327