Edit model card

cs_mT5-large2_2e-5_50_v0.1

This model is a fine-tuned version of google/mt5-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.5108
  • Bleu: 19.8919
  • Gen Len: 17.7619

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
16.9199 1.0 6 10.5138 9.6354 19.0
9.9396 2.0 12 8.3590 8.988 19.0
19.1783 3.0 18 7.4137 8.7723 19.0
9.8097 4.0 24 7.3182 8.8796 19.0
16.8467 5.0 30 7.2232 8.6892 19.0
9.745 6.0 36 6.9902 7.822 19.0
6.2948 7.0 42 6.8174 8.2013 19.0
6.3194 8.0 48 6.7064 7.6678 19.0
6.927 9.0 54 6.6122 9.9162 19.0
7.198 10.0 60 6.5138 13.3863 19.0
7.6505 11.0 66 6.4263 12.4078 19.0
7.9063 12.0 72 6.3326 13.0376 19.0
9.021 13.0 78 6.2376 13.6209 19.0
9.2462 14.0 84 6.1222 13.3871 19.0
7.7924 15.0 90 5.9968 14.1604 19.0
5.1947 16.0 96 5.8706 11.7859 19.0
9.9564 17.0 102 5.7396 13.4904 19.0
5.2706 18.0 108 5.6295 13.5218 19.0
6.6567 19.0 114 5.5203 14.0857 19.0
5.0918 20.0 120 5.3965 15.3213 19.0
6.2442 21.0 126 5.2742 15.6508 19.0
4.5073 22.0 132 5.1884 15.8637 19.0
3.3254 23.0 138 5.1282 14.7385 19.0
6.9905 24.0 144 5.0841 15.5385 19.0
6.3553 25.0 150 5.0408 16.9058 19.0
4.8396 26.0 156 5.0165 16.3831 19.0
4.7646 27.0 162 4.9914 16.2156 19.0
3.6864 28.0 168 4.9643 16.4319 19.0
4.7526 29.0 174 4.9186 17.5044 19.0
4.5518 30.0 180 4.8727 16.7818 19.0
3.9017 31.0 186 4.8264 16.9433 19.0
4.6864 32.0 192 4.7818 16.8868 19.0
3.0676 33.0 198 4.7505 18.2291 19.0
5.9861 34.0 204 4.7214 18.3309 19.0
5.0304 35.0 210 4.7003 18.3309 19.0
3.9478 36.0 216 4.6791 18.1004 19.0
4.9706 37.0 222 4.6651 17.787 19.0
5.0404 38.0 228 4.6401 17.787 19.0
4.938 39.0 234 4.6045 18.6261 17.7619
5.7176 40.0 240 4.5833 17.1931 17.7619
3.3352 41.0 246 4.5654 17.1931 17.7619
4.8397 42.0 252 4.5517 17.6767 17.7619
4.401 43.0 258 4.5441 17.1931 17.7619
5.4609 44.0 264 4.5370 17.5969 17.7619
4.9223 45.0 270 4.5295 19.1503 17.7619
4.092 46.0 276 4.5215 19.1133 17.7619
3.3364 47.0 282 4.5159 19.1133 17.7619
4.9208 48.0 288 4.5131 19.8919 17.7619
3.5934 49.0 294 4.5115 19.8919 17.7619
4.5551 50.0 300 4.5108 19.8919 17.7619

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
0
Safetensors
Model size
1.23B params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from