Edit model card

byt5_3k

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0786

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 800
  • eval_batch_size: 800
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 4 0.1739
No log 2.0 8 0.1598
0.3456 3.0 12 0.1841
0.3456 4.0 16 0.1430
0.3252 5.0 20 0.1628
0.3252 6.0 24 0.1406
0.3252 7.0 28 0.1349
0.3058 8.0 32 0.1498
0.3058 9.0 36 0.1371
0.2953 10.0 40 0.1417
0.2953 11.0 44 0.1371
0.2953 12.0 48 0.1222
0.2827 13.0 52 0.1317
0.2827 14.0 56 0.1227
0.2719 15.0 60 0.1128
0.2719 16.0 64 0.1209
0.2719 17.0 68 0.1200
0.2698 18.0 72 0.1149
0.2698 19.0 76 0.1090
0.2473 20.0 80 0.1045
0.2473 21.0 84 0.1174
0.2473 22.0 88 0.1056
0.2492 23.0 92 0.0976
0.2492 24.0 96 0.1164
0.2377 25.0 100 0.0974
0.2377 26.0 104 0.0922
0.2377 27.0 108 0.1022
0.2312 28.0 112 0.0908
0.2312 29.0 116 0.0891
0.2254 30.0 120 0.0967
0.2254 31.0 124 0.0930
0.2254 32.0 128 0.0866
0.224 33.0 132 0.0828
0.224 34.0 136 0.0850
0.2166 35.0 140 0.0883
0.2166 36.0 144 0.0825
0.2166 37.0 148 0.0804
0.2128 38.0 152 0.0855
0.2128 39.0 156 0.0865
0.2048 40.0 160 0.0798
0.2048 41.0 164 0.0787
0.2048 42.0 168 0.0802
0.2063 43.0 172 0.0812
0.2063 44.0 176 0.0803
0.2075 45.0 180 0.0779
0.2075 46.0 184 0.0773
0.2075 47.0 188 0.0772
0.1997 48.0 192 0.0781
0.1997 49.0 196 0.0783
0.2074 50.0 200 0.0786

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
1
Safetensors
Model size
300M params
Tensor type
F32
·