andreaskoepf commited on
Commit
f81a9f4
1 Parent(s): be2028c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md CHANGED
@@ -1,3 +1,60 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ Experimental pre-training on instruction datasets.
5
+ https://wandb.ai/open-assistant/supervised-finetuning/runs/ys9rt5ue
6
+
7
+ Checkpoint: 3500 steps
8
+
9
+ Used oasst dataset config:
10
+ ```
11
+ pretrain:
12
+ use_custom_sampler: true
13
+ sort_by_length: false
14
+ datasets:
15
+ - joke
16
+ - webgpt:
17
+ val_split: 0.1
18
+ - gpt4all:
19
+ val_split: 0.01
20
+ - alpaca:
21
+ val_split: 0.025
22
+ - code_alpaca:
23
+ val_split: 0.05
24
+ - minimath
25
+ - humaneval_mbpp_codegen_qa
26
+ - humaneval_mbpp_testgen_qa
27
+ - grade_school_math_instructions
28
+ - recipes
29
+ - cmu_wiki_qa
30
+ #- youtube_subs_howto100m # uses incompatible column names
31
+ #- ubuntu_dialogue_qa # fails to load
32
+ - oa_wiki_qa_bart_10000row
33
+ - prosocial_dialogue:
34
+ fraction: 0.1
35
+ - explain_prosocial:
36
+ fraction: 0.05
37
+ ```
38
+
39
+ pythia parameters:
40
+ ```
41
+ pythia-12b:
42
+ dtype: fp16
43
+ log_dir: "pythia_log_12b"
44
+ learning_rate: 6e-6
45
+ model_name: EleutherAI/pythia-12b-deduped
46
+ output_dir: pythia_model_12b
47
+ weight_decay: 0.0
48
+ max_length: 2048
49
+ use_flash_attention: true
50
+ #deepspeed_config: configs/zero3_config.json
51
+ warmup_steps: 50
52
+ gradient_checkpointing: true
53
+ gradient_accumulation_steps: 8
54
+ per_device_train_batch_size: 2
55
+ per_device_eval_batch_size: 5
56
+ eval_steps: 200
57
+ save_steps: 500
58
+ num_train_epochs: 2
59
+ save_total_limit: 2
60
+ ```