Update README.md
Browse files
README.md
CHANGED
@@ -19,3 +19,10 @@ Pretrained T5 model with nanoT5:
|
|
19 |
- handles whitespaces etc correctly (unlike standard T5 tokenizer)
|
20 |
- 1024 ctx during pretrain
|
21 |
- `relative_attention_num_buckets` increased to 48 from standard 32 for context length upscaling
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
- handles whitespaces etc correctly (unlike standard T5 tokenizer)
|
20 |
- 1024 ctx during pretrain
|
21 |
- `relative_attention_num_buckets` increased to 48 from standard 32 for context length upscaling
|
22 |
+
|
23 |
+
## Experiment logs
|
24 |
+
|
25 |
+
Training consisted of two phases:
|
26 |
+
|
27 |
+
- [phase one](https://wandb.ai/pszemraj/nanoT5/runs/l0y9uuv3) - ~30k steps at context length 512
|
28 |
+
- [phase two](https://wandb.ai/pszemraj/nanoT5/runs/mao0tqjy) - 20k steps at context length 1024
|