Don't use this model for any applied task. It too small to be practically useful. It is just a part of a weird research project.

An extremely small version of T5 with these parameters

  "d_ff": 1024,
  "d_kv": 64,
  "d_model": 256,
  "num_heads": 4,
  "num_layers": 1,  # yes, just one layer

The model was pre-trained on realnewslike subset of C4 for 1 epoch with sequence length 64. Corresponding WandB run: click.

Downloads last month
23
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.