An 11B T5 model trained on the P3 (T0 split) dataset for 20,000 steps with a batch size of 2048 a maximum input sequence length of 1024, a maximum output sequence length of 256, and the Adafactor optimizer with a constant learning rate of 0.001. The model is trained from the T5 v1.1 lm-adapt checkpoint and fully finetuned.

For more details, see HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation.

Performance on T0 held-out tasks (average accuracy across prompts using rank classification):

Model ANLI (avg) HellaSwag StoryCloze CB COPA RTE WiC WSC WinoGrande Average
T0-11B 41.0 33.6 92.4 70.1 91.5 81.0 56.1 61.1 59.9 65.2
hypertask_T0_11B (this model) 46.8 34.1 98.2 81.2 96.6 84.0 52.1 62.6 64.8 68.9
Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train hamishivi/hypertask_T0_11B