pszemraj commited on
Commit
db5a103
1 Parent(s): 1fd7fb6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -19,3 +19,10 @@ Pretrained T5 model with nanoT5:
19
  - handles whitespaces etc correctly (unlike standard T5 tokenizer)
20
  - 1024 ctx during pretrain
21
  - `relative_attention_num_buckets` increased to 48 from standard 32 for context length upscaling
 
 
 
 
 
 
 
 
19
  - handles whitespaces etc correctly (unlike standard T5 tokenizer)
20
  - 1024 ctx during pretrain
21
  - `relative_attention_num_buckets` increased to 48 from standard 32 for context length upscaling
22
+
23
+ ## Experiment logs
24
+
25
+ Training consisted of two phases:
26
+
27
+ - [phase one](https://wandb.ai/pszemraj/nanoT5/runs/l0y9uuv3) - ~30k steps at context length 512
28
+ - [phase two](https://wandb.ai/pszemraj/nanoT5/runs/mao0tqjy) - 20k steps at context length 1024