Edit model card

flan-t5-xl-instructiongen

This model is a fine-tuned version of google/flan-t5-xl on the pszemraj/fleece2instructions dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8314
  • Rouge1: 65.3297
  • Rouge2: 48.8475
  • Rougel: 63.4183
  • Rougelsum: 63.5458
  • Gen Len: 13.7474

Model description

More information needed

Intended uses & limitations

Generate/recover instructions (assumes that there is just an instruction, not inputs as well) from arbitrary text.

Training and evaluation data

Refer to pszemraj/fleece2instructions

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 6e-05
  • train_batch_size: 4
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.9615 1.0 362 0.8353 63.9163 47.0456 61.9554 62.0549 13.3737
0.809 2.0 724 0.8251 64.5398 47.9107 62.5928 62.7278 13.4763
Downloads last month
3

Finetuned from

Dataset used to train pszemraj/flan-t5-xl-instructiongen

Evaluation results