Edit model card

nanoT5-mid-65kBPE-2048

This is a "raw" pretrained model intended to be fine-tuned on downstream tasks

A "mid" size T5 model pretrained on c4:

  • trained @ context length 2048
  • 16 layers, hidden size 1024, FF 3072. SiLU activations
  • pretrained on allenai/c4 (en subset) for 65k steps
  • uses an adapted claude3 tokenizer; vocab size 65k

More details and logs under checkpoints/

Downloads last month
7
Safetensors
Model size
637M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/nanoT5-mid-65kBPE-2048

Finetunes
1 model

Dataset used to train pszemraj/nanoT5-mid-65kBPE-2048