Edit model card

ft-wmt14-5

This model is a fine-tuned version of google/mt5-small on the lilferrit/wmt14-short dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0604
  • Bleu: 20.7584
  • Gen Len: 30.499

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adafactor
  • lr_scheduler_type: constant
  • training_steps: 100000

Training results

Training Loss Epoch Step Bleu Gen Len Validation Loss
1.9166 0.2778 10000 15.8119 32.097 2.3105
1.7184 0.5556 20000 17.5903 31.1153 2.1993
1.6061 0.8333 30000 18.9604 30.327 2.1380
1.516 1.1111 40000 19.1444 30.2727 2.1366
1.4675 1.3889 50000 19.7588 30.1127 2.1208
1.4416 1.6667 60000 19.9263 30.4463 2.0889
1.4111 1.9444 70000 2.0795 20.3323 30.1207
1.3603 2.2222 80000 2.0850 20.5373 30.5943
1.3378 2.5 90000 2.0604 20.7584 30.499
1.3381 2.7778 100000 2.0597 20.6113 30.701

Framework versions

  • Transformers 4.40.0
  • Pytorch 2.2.2+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
300M params
Tensor type
F32
·

Finetuned from

Dataset used to train lilferrit/ft-wmt14-5

Evaluation results