Edit model card

mt5-small

This model was trained from scratch on TEC-JL Japanese learner error corpus dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0758
  • Bleu: 67.2605
  • Gen Len: 13.051

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 12

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
1.0485 1.0 3125 0.7139 0.0061 13.051
0.2413 2.0 6250 0.1114 53.3974 13.056
0.1153 3.0 9375 0.0937 61.71 13.056
0.0918 4.0 12500 0.0867 63.8407 13.056
0.0819 5.0 15625 0.0833 65.2015 13.056
0.08 6.0 18750 0.0806 65.6513 13.056
0.078 7.0 21875 0.0793 66.3861 13.051
0.0704 8.0 25000 0.0779 66.6447 13.051
0.0724 9.0 28125 0.0759 67.2105 13.051
0.0707 10.0 31250 0.0765 67.3232 13.051
0.0682 11.0 34375 0.0761 67.3443 13.051
0.07 12.0 37500 0.0758 67.2605 13.051

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
2
Safetensors
Model size
300M params
Tensor type
F32
·