nanoT5-mid-65kBPE-2048
This is a "raw" pretrained model intended to be fine-tuned on downstream tasks
A "mid" size T5 model pretrained on c4:
- trained @ context length 2048
- 16 layers, hidden size 1024, FF 3072. SiLU activations
- pretrained on
allenai/c4
(en
subset) for 65k steps - uses an adapted claude3 tokenizer; vocab size 65k
More details and logs under checkpoints/
- Downloads last month
- 7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.