Edit model card

ancient_semitic_bert

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9118
  • Perplexity: 6.77 (40 Epochs)

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 10000
  • num_epochs: 40.0

Training results

Training Loss Epoch Step Validation Loss
6.778 1.0 55319 6.4618
6.4271 2.0 110638 6.3701
6.3616 3.0 165957 6.3217
6.3257 4.0 221276 6.2966
6.3001 5.0 276595 6.2759
6.2834 6.0 331914 6.2610
6.2699 7.0 387233 6.2465
6.2565 8.0 442552 6.1939
6.2221 9.0 497871 6.1154
6.0721 10.0 553190 5.9524
5.9212 11.0 608509 5.7947
5.8113 12.0 663828 5.7161
5.7509 13.0 719147 5.6614
5.7053 14.0 774466 5.6158
5.6665 15.0 829785 5.5774
5.634 16.0 885104 5.5448
5.6055 17.0 940423 2.7563
3.3308 18.0 995742 2.5443
2.6179 19.0 1051061 2.4196
2.5324 20.0 1106380 2.3393
2.4791 21.0 1161699 2.2755
2.4105 22.0 1217018 2.2241
2.3582 23.0 1272337 2.1772
2.3281 24.0 1327656 2.1416
2.2987 25.0 1382975 2.1137
2.7859 26.0 1438294 2.0950
2.2728 27.0 1493613 2.0685
2.2308 28.0 1548932 2.0499
2.1739 29.0 1604251 2.0082
2.1569 30.0 1659570 1.9939
2.1425 31.0 1714889 1.9802
2.1318 32.0 1770208 1.9669
2.1207 33.0 1825527 1.9583
2.1111 34.0 1880846 1.9477
2.102 35.0 1936165 1.9409
2.0943 36.0 1991484 1.9313
2.0871 37.0 2046803 1.9236
2.0736 38.0 2102122 1.9191
2.0693 39.0 2157441 1.9147
2.0653 40.0 2212760 1.9118

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.3.0a0+ebedce2
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
83
Safetensors
Model size
126M params
Tensor type
F32
·

Datasets used to train mehdie/ancient_semitic_bert