Edit model card

md_mt5_2611_retrain_v20_imst

This model is a fine-tuned version of Buseak/md_mt5_1911_v18_retrain on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1318
  • Bleu: 2.1555
  • Gen Len: 18.766

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 15

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
0.4201 1.0 916 0.2039 1.9544 18.7368
0.396 2.0 1832 0.1862 2.0084 18.7398
0.363 3.0 2748 0.1757 2.0274 18.7475
0.3559 4.0 3664 0.1681 2.0622 18.7573
0.3389 5.0 4580 0.1606 2.0798 18.7614
0.3338 6.0 5496 0.1528 2.1003 18.7581
0.3192 7.0 6412 0.1476 2.1099 18.7622
0.3162 8.0 7328 0.1449 2.1171 18.7554
0.2958 9.0 8244 0.1405 2.1293 18.7559
0.3012 10.0 9160 0.1379 2.1451 18.7606
0.2909 11.0 10076 0.1352 2.1459 18.7644
0.2869 12.0 10992 0.1338 2.1467 18.7628
0.2875 13.0 11908 0.1324 2.1467 18.7639
0.2875 14.0 12824 0.1317 2.155 18.7649
0.2836 15.0 13740 0.1318 2.1555 18.766

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
7
Safetensors
Model size
300M params
Tensor type
F32
·

Finetuned from