Edit model card

An 11B T5 model trained on the P3 (T0 split) dataset for 20,000 steps with a batch size of 2048 a maximum input sequence length of 1024, a maximum output sequence length of 256, and the Adafactor optimizer with a constant learning rate of 0.001. The model is trained from the T5 v1.1 lm-adapt checkpoint and fully finetuned.

For more details, see HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation.

Performance on T0 held-out tasks (average accuracy across prompts using rank classification):

Model ANLI (avg) HellaSwag StoryCloze CB COPA RTE WiC WSC WinoGrande Average
T0-11B 41.0 33.6 92.4 70.1 91.5 81.0 56.1 61.1 59.9 65.2
hypertask_T0_11B (this model) 46.8 34.1 98.2 81.2 96.6 84.0 52.1 62.6 64.8 68.9
Downloads last month
2

Dataset used to train hamishivi/hypertask_T0_11B