Edit model card

pruned-mt5-small

This model is a fine-tuned version of X-Wang/pruned-mt5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4431
  • Bleu: 11.4084
  • Gen Len: 16.1053

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 12
  • eval_batch_size: 12
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
3.3446 0.07 2000 2.9103 10.3957 16.0567
2.8425 0.14 4000 2.8570 10.5695 16.1895
3.186 0.21 6000 2.8137 10.5958 16.1523
2.788 0.28 8000 2.7593 10.7553 16.0138
2.9075 0.35 10000 2.7266 10.9199 16.2016
3.0579 0.42 12000 2.7030 10.6 16.0496
2.3618 0.49 14000 2.6547 10.8026 16.0412
3.079 0.56 16000 2.6441 10.7945 16.1148
2.7597 0.63 18000 2.6244 10.5877 16.0507
2.8533 0.7 20000 2.6049 10.9986 16.1145
2.843 0.77 22000 2.5836 10.9173 16.0826
2.8268 0.84 24000 2.5685 10.8136 16.0516
2.7021 0.91 26000 2.5509 11.326 16.0554
3.338 0.98 28000 2.5289 11.1485 16.0333
2.7374 1.05 30000 2.5220 11.0166 16.0998
2.7996 1.12 32000 2.5077 11.1316 16.131
2.6897 1.19 34000 2.4994 11.0811 16.1139
2.4107 1.26 36000 2.4877 11.2641 16.142
2.7695 1.33 38000 2.4756 11.2135 16.0977
3.3271 1.41 40000 2.4658 11.3328 16.0953
2.2641 1.48 42000 2.4612 11.3065 16.0549
2.6594 1.55 44000 2.4556 11.2684 16.1371
2.7322 1.62 46000 2.4520 11.3739 16.1058
2.6824 1.69 48000 2.4462 11.3335 16.1043
2.3369 1.76 50000 2.4455 11.3851 16.1239
2.9537 1.83 52000 2.4430 11.4026 16.0858
2.3928 1.9 54000 2.4433 11.301 16.1129
2.4714 1.97 56000 2.4431 11.4084 16.1053

Framework versions

  • Transformers 4.31.0
  • Pytorch 2.0.0
  • Datasets 2.13.1
  • Tokenizers 0.13.3
Downloads last month
2

Finetuned from

Dataset used to train X-Wang/pruned-mt5-small