Edit model card

pos_tagger_mT5

This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2412
  • Bleu: 43.0091
  • Gen Len: 15.8396

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 15

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
1.4428 1.0 1951 1.0474 5.7868 9.7778
1.1516 2.0 3902 0.9576 10.5757 12.2847
1.0597 3.0 5853 0.8899 13.9347 14.9239
0.9884 4.0 7804 0.8302 16.1689 15.3619
0.9326 5.0 9755 0.7601 18.7725 14.7579
0.8743 6.0 11706 0.6490 23.2392 15.3214
0.7841 7.0 13657 0.5002 30.4345 15.5328
0.6894 8.0 15608 0.4017 35.8055 15.7107
0.6169 9.0 17559 0.3348 39.2237 15.7143
0.5643 10.0 19510 0.2959 40.8199 15.7694
0.5362 11.0 21461 0.2716 41.8776 15.8021
0.515 12.0 23412 0.2573 42.4591 15.8157
0.4902 13.0 25363 0.2470 42.7953 15.8316
0.47 14.0 27314 0.2420 42.9749 15.8346
0.4635 15.0 29265 0.2412 43.0091 15.8396

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month
0
Safetensors
Model size
300M params
Tensor type
F32
·

Finetuned from