andreaskoepf commited on
Commit
8cd2153
·
1 Parent(s): cf0da05

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -0
README.md CHANGED
@@ -8,8 +8,50 @@ license: apache-2.0
8
 
9
  Model:
10
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ```
12
 
13
  Dataset:
14
  ```
 
 
 
 
 
 
 
 
 
 
 
15
  ```
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  Model:
10
  ```
11
+ falcon-7b:
12
+ dtype: bf16
13
+ log_dir: "falcon_log_7b"
14
+ learning_rate: 1e-5
15
+ model_name: "tiiuae/falcon-7b"
16
+ deepspeed_config: configs/zero_config.json
17
+ output_dir: falcon
18
+ weight_decay: 0.0
19
+ max_length: 2048
20
+ warmup_steps: 20
21
+ gradient_checkpointing: true
22
+ gradient_accumulation_steps: 4
23
+ per_device_train_batch_size: 4
24
+ per_device_eval_batch_size: 8
25
+ eval_steps: 100
26
+ save_steps: 500
27
+ save_strategy: steps
28
+ num_train_epochs: 8
29
+ save_total_limit: 4
30
+ residual_dropout: 0.2
31
+ residual_dropout_lima: true
32
  ```
33
 
34
  Dataset:
35
  ```
36
+ oasst-top1:
37
+ # oasst_export: 11123 (100.00%)
38
+ save_strategy: steps
39
+ eval_steps: 80
40
+ save_steps: 80
41
+ datasets:
42
+ - oasst_export:
43
+ lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk" # sft-8.0
44
+ input_file_path: 2023-06-02_oasst_all_labels.jsonl.gz
45
+ val_split: 0.05
46
+ top_k: 1
47
  ```
48
+
49
+ Train command:
50
+ ```
51
+ deepspeed trainer_sft.py --configs defaults falcon-7b oasst-top1 --cache_dir <data_cache_dir> --output_dir <output_path> --deepspeed
52
+ ```
53
+
54
+ Export command:
55
+ ```
56
+ python export_model.py --dtype bf16 --hf_repo_name OpenAssistant/falcon-7b-sft-top1 --trust_remote_code --auth_token <auth_token> <output_path> --max_shard_size 2GB
57
+ ```