pszemraj
/

tFINE-900m-e16-d32-1024ctx

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Sep 1

Commit

db5a103

•

1 Parent(s): 1fd7fb6

Update README.md

Files changed (1) hide show

README.md +7 -0

README.md CHANGED Viewed

@@ -19,3 +19,10 @@ Pretrained T5 model with nanoT5:
   - handles whitespaces etc correctly (unlike standard T5 tokenizer)
 - 1024 ctx during pretrain
 - `relative_attention_num_buckets` increased to 48 from standard 32 for context length upscaling

   - handles whitespaces etc correctly (unlike standard T5 tokenizer)
 - 1024 ctx during pretrain
 - `relative_attention_num_buckets` increased to 48 from standard 32 for context length upscaling
+## Experiment logs
+Training consisted of two phases:
+- [phase one](https://wandb.ai/pszemraj/nanoT5/runs/l0y9uuv3) - ~30k steps at context length 512
+- [phase two](https://wandb.ai/pszemraj/nanoT5/runs/mao0tqjy) - 20k steps at context length 1024