Edit model card

Google's T5-v1.1-base pre-trained for 24 hours (80k steps / 256 batch size) on a single GPU in nanoT5 library for efficient pre-training.

For more details about the model refer to the original paper and original model weights.

It can be further fine-tuned on SuperNatural-Instructions dataset to achieve comparable performance to the same model pre-trained on 150x more data through "a combination of model and data parallelism [...] on slices of Cloud TPU Pods", each with 1024 TPUs.

Downloads last month
20
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train pnawrot/nanoT5-base