Buseak's picture
End of training
569cfdb
metadata
license: apache-2.0
base_model: google/mt5-small
tags:
  - generated_from_trainer
metrics:
  - bleu
model-index:
  - name: md_mt5_base_boun_split_first_v2
    results: []

md_mt5_base_boun_split_first_v2

This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4874
  • Bleu: 0.5931
  • Gen Len: 18.7836

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 15

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
14.1813 1.0 975 2.7812 1.1085 18.9862
2.9943 2.0 1950 1.4463 1.4366 18.7331
1.9645 3.0 2925 1.0962 0.5916 18.7738
1.5852 4.0 3900 0.8990 0.5837 18.6944
1.3504 5.0 4875 0.7589 0.5952 18.7164
1.1926 6.0 5850 0.6843 0.6057 18.7367
1.0963 7.0 6825 0.6291 0.5969 18.7197
1.0192 8.0 7800 0.5902 0.6007 18.7428
0.9537 9.0 8775 0.5614 0.5879 18.7492
0.9127 10.0 9750 0.5366 0.5871 18.7667
0.8705 11.0 10725 0.5166 0.5841 18.7718
0.8472 12.0 11700 0.5041 0.5869 18.7777
0.8312 13.0 12675 0.4963 0.5917 18.7821
0.8243 14.0 13650 0.4890 0.5944 18.7838
0.8099 15.0 14625 0.4874 0.5931 18.7836

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0