tFINE-900m-e16-d32-1024ctx

Pretrained T5 model with nanoT5:

  • ~900m parameters, 16 layers in encoder, 32 layers in decoder
  • sentencepiece tokenizer with 48k vocab & byte-pair fallback
    • handles whitespaces etc correctly (unlike original T5 tokenizer)
  • 1024 ctx during pretrain
  • relative_attention_num_buckets increased to 48 from 32 for context length upscaling

Experiment logs

Training consisted of two phases:

  • phase one - ~30k steps at context length 512
  • phase two - 20k steps at context length 1024
Downloads last month
54
Safetensors
Model size
887M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/tFINE-900m-e16-d32-1024ctx

Finetunes
1 model

Dataset used to train pszemraj/tFINE-900m-e16-d32-1024ctx

Collection including pszemraj/tFINE-900m-e16-d32-1024ctx