Edit model card

GQA_BERT_German_legal_SQuAD_part_augmented_100

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0964

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 160
  • eval_batch_size: 40
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 3 5.1190
No log 2.0 6 4.5892
No log 3.0 9 3.9684
No log 4.0 12 3.6427
No log 5.0 15 3.2081
No log 6.0 18 2.8413
No log 7.0 21 2.5487
No log 8.0 24 2.2830
No log 9.0 27 2.0807
No log 10.0 30 1.8644
No log 11.0 33 1.7166
No log 12.0 36 1.5672
No log 13.0 39 1.3949
No log 14.0 42 1.3109
No log 15.0 45 1.2622
No log 16.0 48 1.1875
No log 17.0 51 1.1579
No log 18.0 54 1.1329
No log 19.0 57 1.1090
No log 20.0 60 1.0811
No log 21.0 63 1.0542
No log 22.0 66 1.0481
No log 23.0 69 1.0355
No log 24.0 72 1.0304
No log 25.0 75 1.0276
No log 26.0 78 1.0277
No log 27.0 81 1.0329
No log 28.0 84 1.0356
No log 29.0 87 1.0410
No log 30.0 90 1.0267
No log 31.0 93 1.0280
No log 32.0 96 1.0453
No log 33.0 99 1.0520
No log 34.0 102 1.0430
No log 35.0 105 1.0393
No log 36.0 108 1.0370
No log 37.0 111 1.0284
No log 38.0 114 1.0313
No log 39.0 117 1.0376
No log 40.0 120 1.0312
No log 41.0 123 1.0218
No log 42.0 126 1.0348
No log 43.0 129 1.0426
No log 44.0 132 1.0411
No log 45.0 135 1.0463
No log 46.0 138 1.0661
No log 47.0 141 1.0733
No log 48.0 144 1.0609
No log 49.0 147 1.0578
No log 50.0 150 1.0639
No log 51.0 153 1.0490
No log 52.0 156 1.0507
No log 53.0 159 1.0460
No log 54.0 162 1.0534
No log 55.0 165 1.0530
No log 56.0 168 1.0521
No log 57.0 171 1.0470
No log 58.0 174 1.0462
No log 59.0 177 1.0547
No log 60.0 180 1.0628
No log 61.0 183 1.0550
No log 62.0 186 1.0474
No log 63.0 189 1.0536
No log 64.0 192 1.0711
No log 65.0 195 1.0832
No log 66.0 198 1.0855
No log 67.0 201 1.0901
No log 68.0 204 1.0912
No log 69.0 207 1.0888
No log 70.0 210 1.0882
No log 71.0 213 1.0985
No log 72.0 216 1.1056
No log 73.0 219 1.0876
No log 74.0 222 1.0781
No log 75.0 225 1.0894
No log 76.0 228 1.0906
No log 77.0 231 1.0848
No log 78.0 234 1.0851
No log 79.0 237 1.0949
No log 80.0 240 1.0982
No log 81.0 243 1.0932
No log 82.0 246 1.0825
No log 83.0 249 1.0791
No log 84.0 252 1.0821
No log 85.0 255 1.0819
No log 86.0 258 1.0808
No log 87.0 261 1.0794
No log 88.0 264 1.0815
No log 89.0 267 1.0859
No log 90.0 270 1.0883
No log 91.0 273 1.0890
No log 92.0 276 1.0935
No log 93.0 279 1.0982
No log 94.0 282 1.1007
No log 95.0 285 1.0994
No log 96.0 288 1.0997
No log 97.0 291 1.0998
No log 98.0 294 1.0978
No log 99.0 297 1.0970
No log 100.0 300 1.0964

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.7
  • Tokenizers 0.15.0
Downloads last month
2
Safetensors
Model size
108M params
Tensor type
F32