Edit model card

byt5-small-wikipron-eng-latn-in-broad

This model is a fine-tuned version of google/byt5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1796
  • Per: 0.3243
  • Gen Len: 16.3969

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Per Gen Len
2.2292 1.0 233 0.3751 0.4995 16.0446
0.387 2.0 466 0.2547 0.4174 16.3408
0.2887 3.0 699 0.2157 0.362 16.3243
0.2459 4.0 932 0.2019 0.3432 16.3488
0.2224 5.0 1165 0.1925 0.3339 16.3654
0.207 6.0 1398 0.1868 0.328 16.3744
0.1956 7.0 1631 0.1816 0.3276 16.3906
0.1872 8.0 1864 0.1815 0.3241 16.3957
0.1823 9.0 2097 0.1812 0.3244 16.3927
0.1778 10.0 2330 0.1796 0.3243 16.3969

Framework versions

  • Transformers 4.29.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
10