Edit model card

byt5_add

This model is a fine-tuned version of google/byt5-small on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1606

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 800
  • eval_batch_size: 800
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 13 4.2259
No log 2.0 26 2.4178
No log 3.0 39 1.9256
No log 4.0 52 1.7310
No log 5.0 65 1.6577
No log 6.0 78 1.6385
No log 7.0 91 1.6110
No log 8.0 104 1.5811
No log 9.0 117 1.5237
No log 10.0 130 1.4809
No log 11.0 143 1.4378
No log 12.0 156 1.3976
No log 13.0 169 1.3462
No log 14.0 182 1.2587
No log 15.0 195 1.2260
No log 16.0 208 1.1018
No log 17.0 221 1.0273
No log 18.0 234 0.9436
No log 19.0 247 0.8007
No log 20.0 260 0.6919
No log 21.0 273 0.6201
No log 22.0 286 0.5486
No log 23.0 299 0.4804
No log 24.0 312 0.4080
No log 25.0 325 0.3861
No log 26.0 338 0.3477
No log 27.0 351 0.3181
No log 28.0 364 0.2921
No log 29.0 377 0.2832
No log 30.0 390 0.2693
No log 31.0 403 0.2469
No log 32.0 416 0.2453
No log 33.0 429 0.2313
No log 34.0 442 0.2134
No log 35.0 455 0.2139
No log 36.0 468 0.2088
No log 37.0 481 0.2007
No log 38.0 494 0.1960
1.3 39.0 507 0.1830
1.3 40.0 520 0.1782
1.3 41.0 533 0.1746
1.3 42.0 546 0.1741
1.3 43.0 559 0.1708
1.3 44.0 572 0.1668
1.3 45.0 585 0.1650
1.3 46.0 598 0.1651
1.3 47.0 611 0.1629
1.3 48.0 624 0.1627
1.3 49.0 637 0.1610
1.3 50.0 650 0.1606

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
5
Safetensors
Model size
300M params
Tensor type
F32
·

Finetuned from