lbergen commited on
Commit
dc4fc25
Β·
verified Β·
1 Parent(s): ffe1872

Upload 2 files

Browse files
Pretraining/config_tree.txt ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ CONFIG
2
+ β”œβ”€β”€ train
3
+ β”‚ └── seed: 2222
4
+ β”‚ interval: step
5
+ β”‚ monitor: test/loss
6
+ β”‚ mode: min
7
+ β”‚ ema: 0.0
8
+ β”‚ test: false
9
+ β”‚ debug: false
10
+ β”‚ ignore_warnings: false
11
+ β”‚ state:
12
+ β”‚ mode: null
13
+ β”‚ n_context: 0
14
+ β”‚ n_context_eval: 0
15
+ β”‚ ckpt: null
16
+ β”‚ disable_dataset: false
17
+ β”‚ validate_at_start: false
18
+ β”‚ pretrained_model_path: null
19
+ β”‚ pretrained_model_strict_load: true
20
+ β”‚ pretrained_model_state_hook:
21
+ β”‚ _name_: null
22
+ β”‚ post_init_hook:
23
+ β”‚ _name_: null
24
+ β”‚ layer_decay:
25
+ β”‚ _name_: null
26
+ β”‚ decay: 0.7
27
+ β”‚ gpu_mem: 82
28
+ β”‚ global_batch_size: 144
29
+ β”‚
30
+ β”œβ”€β”€ tolerance
31
+ β”‚ └── logdir: ./resume
32
+ β”‚ id: null
33
+ β”‚
34
+ β”œβ”€β”€ wandb
35
+ β”‚ └── project: rna-llm
36
+ β”‚ group: ''
37
+ β”‚ job_type: training
38
+ β”‚ mode: online
39
+ β”‚ name: null
40
+ β”‚ save_dir: .
41
+ β”‚ id: null
42
+ β”‚
43
+ β”œβ”€β”€ trainer
44
+ β”‚ └── _target_: pytorch_lightning.Trainer
45
+ β”‚ devices: 6
46
+ β”‚ accelerator: gpu
47
+ β”‚ accumulate_grad_batches: 6
48
+ β”‚ max_epochs: 2
49
+ β”‚ gradient_clip_val: 1.0
50
+ β”‚ log_every_n_steps: 10
51
+ β”‚ limit_train_batches: 1.0
52
+ β”‚ limit_val_batches: 1.0
53
+ β”‚ num_nodes: 1
54
+ β”‚ precision: bf16
55
+ β”‚
56
+ β”œβ”€β”€ loader
57
+ β”‚ └── batch_size: 50
58
+ β”‚ num_workers: 4
59
+ β”‚ pin_memory: true
60
+ β”‚ drop_last: true
61
+ β”‚
62
+ β”œβ”€β”€ dataset
63
+ β”‚ └── _name_: mrna
64
+ β”‚ fasta_directory: /workspace/data/mrna/
65
+ β”‚ dataset_name: mrna
66
+ β”‚ tokenizer_name: char
67
+ β”‚ cache_dir: null
68
+ β”‚ max_length: 8192
69
+ β”‚ add_eos: true
70
+ β”‚ batch_size: 4
71
+ β”‚ batch_size_eval: 8
72
+ β”‚ num_workers: 12
73
+ β”‚ shuffle: true
74
+ β”‚ pin_memory: true
75
+ β”‚ max_length_val: 8192
76
+ β”‚ max_length_test: 8192
77
+ β”‚ pad_max_length: null
78
+ β”‚ rc_aug: false
79
+ β”‚ use_fixed_len_val: false
80
+ β”‚
81
+ β”œβ”€β”€ optimizer
82
+ β”‚ └── _name_: adamw
83
+ β”‚ lr: 0.0006
84
+ β”‚ weight_decay: 0.1
85
+ β”‚ betas:
86
+ β”‚ - 0.9
87
+ β”‚ - 0.999
88
+ β”‚
89
+ β”œβ”€β”€ scheduler
90
+ β”‚ └── _name_: cosine_warmup_timm
91
+ β”‚ t_in_epochs: false
92
+ β”‚ t_initial: 24000
93
+ β”‚ lr_min: 5.9999999999999995e-05
94
+ β”‚ warmup_lr_init: 1.0e-06
95
+ β”‚ warmup_t: 2000
96
+ β”‚
97
+ β”œβ”€β”€ callbacks
98
+ β”‚ └── learning_rate_monitor:
99
+ β”‚ logging_interval: step
100
+ β”‚ timer:
101
+ β”‚ step: true
102
+ β”‚ inter_step: false
103
+ β”‚ epoch: true
104
+ β”‚ val: true
105
+ β”‚ params:
106
+ β”‚ total: true
107
+ β”‚ trainable: true
108
+ β”‚ fixed: true
109
+ β”‚ model_checkpoint:
110
+ β”‚ monitor: test/loss
111
+ β”‚ mode: min
112
+ β”‚ save_top_k: 1
113
+ β”‚ save_last: true
114
+ β”‚ dirpath: checkpoints/
115
+ β”‚ filename: test/loss
116
+ β”‚ auto_insert_metric_name: false
117
+ β”‚ verbose: true
118
+ β”‚
119
+ β”œβ”€β”€ task
120
+ β”‚ └── _name_: lm
121
+ β”‚ loss:
122
+ β”‚ _name_: cross_entropy
123
+ β”‚ ignore_index: 4
124
+ β”‚ torchmetrics:
125
+ β”‚ - perplexity
126
+ β”‚ - num_tokens
127
+ β”‚
128
+ β”œβ”€β”€ encoder
129
+ β”‚ └── None
130
+ β”œβ”€β”€ decoder
131
+ β”‚ └── None
132
+ ���── model
133
+ └── _name_: lm
134
+ d_model: 1024
135
+ n_layer: 24
136
+ d_inner: 4096
137
+ vocab_size: 12
138
+ resid_dropout: 0.0
139
+ embed_dropout: 0.1
140
+ fused_mlp: false
141
+ fused_dropout_add_ln: false
142
+ checkpoint_mixer: false
143
+ checkpoint_mlp: false
144
+ residual_in_fp32: true
145
+ pad_vocab_size_multiple: 8
146
+ layer:
147
+ _name_: hyena
148
+ emb_dim: 5
149
+ filter_order: 64
150
+ local_order: 3
151
+ l_max: 8194
152
+ modulate: true
153
+ w: 10
154
+ lr: 0.0006
155
+ wd: 0.0
156
+ lr_pos_emb: 0.0
157
+
Pretraining/last.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5da354655edbc656a5952c7b7f7fcb3359ab28b1103e6f13191fd3139b078559
3
+ size 3658749089