Edit model card

md_mt5_1911_v18_retrain

This model is a fine-tuned version of Buseak/md_mt5_1911_v16_deneme on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1754
  • Bleu: 0.7623
  • Gen Len: 18.7866

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 15

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
0.6365 1.0 1250 0.3435 0.6788 18.7694
0.5732 2.0 2500 0.3064 0.7037 18.7644
0.5375 3.0 3750 0.2819 0.7114 18.7706
0.4912 4.0 5000 0.2549 0.7237 18.77
0.4648 5.0 6250 0.2394 0.7354 18.772
0.4321 6.0 7500 0.2245 0.7335 18.7762
0.4159 7.0 8750 0.2131 0.7446 18.778
0.4044 8.0 10000 0.2030 0.7478 18.7776
0.3889 9.0 11250 0.1963 0.7496 18.7852
0.3798 10.0 12500 0.1896 0.7524 18.7834
0.3733 11.0 13750 0.1836 0.757 18.7854
0.3623 12.0 15000 0.1803 0.7596 18.7852
0.3583 13.0 16250 0.1775 0.7618 18.7878
0.3643 14.0 17500 0.1758 0.7616 18.7828
0.3609 15.0 18750 0.1754 0.7623 18.7866

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
8
Safetensors
Model size
300M params
Tensor type
F32
·

Finetuned from