Edit model card

BERT-legal-de-cased_German_legal_SQuAD_part_augmented_100

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2932

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 160
  • eval_batch_size: 40
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 3 6.0736
No log 2.0 6 5.9879
No log 3.0 9 5.5392
No log 4.0 12 5.2221
No log 5.0 15 5.0693
No log 6.0 18 4.7391
No log 7.0 21 4.4545
No log 8.0 24 4.1691
No log 9.0 27 3.9251
No log 10.0 30 3.7327
No log 11.0 33 3.5390
No log 12.0 36 3.4126
No log 13.0 39 3.2180
No log 14.0 42 3.1053
No log 15.0 45 2.9604
No log 16.0 48 2.8479
No log 17.0 51 2.7023
No log 18.0 54 2.5315
No log 19.0 57 2.4214
No log 20.0 60 2.2855
No log 21.0 63 2.2884
No log 22.0 66 2.1014
No log 23.0 69 2.0184
No log 24.0 72 1.9246
No log 25.0 75 1.9333
No log 26.0 78 1.8171
No log 27.0 81 1.7873
No log 28.0 84 1.7801
No log 29.0 87 1.5837
No log 30.0 90 1.6417
No log 31.0 93 1.5522
No log 32.0 96 1.5645
No log 33.0 99 1.4813
No log 34.0 102 1.4647
No log 35.0 105 1.5458
No log 36.0 108 1.4655
No log 37.0 111 1.4321
No log 38.0 114 1.4592
No log 39.0 117 1.3771
No log 40.0 120 1.4014
No log 41.0 123 1.4489
No log 42.0 126 1.3550
No log 43.0 129 1.4170
No log 44.0 132 1.3729
No log 45.0 135 1.3514
No log 46.0 138 1.3448
No log 47.0 141 1.3818
No log 48.0 144 1.2925
No log 49.0 147 1.3724
No log 50.0 150 1.3596
No log 51.0 153 1.3396
No log 52.0 156 1.4308
No log 53.0 159 1.3578
No log 54.0 162 1.4014
No log 55.0 165 1.3907
No log 56.0 168 1.3847
No log 57.0 171 1.3856
No log 58.0 174 1.3461
No log 59.0 177 1.3720
No log 60.0 180 1.3300
No log 61.0 183 1.3222
No log 62.0 186 1.3197
No log 63.0 189 1.3427
No log 64.0 192 1.3049
No log 65.0 195 1.3060
No log 66.0 198 1.3300
No log 67.0 201 1.3105
No log 68.0 204 1.3084
No log 69.0 207 1.3259
No log 70.0 210 1.2938
No log 71.0 213 1.2957
No log 72.0 216 1.2767
No log 73.0 219 1.2905
No log 74.0 222 1.2884
No log 75.0 225 1.2639
No log 76.0 228 1.2781
No log 77.0 231 1.2654
No log 78.0 234 1.2681
No log 79.0 237 1.2774
No log 80.0 240 1.3002
No log 81.0 243 1.3049
No log 82.0 246 1.2959
No log 83.0 249 1.2962
No log 84.0 252 1.3013
No log 85.0 255 1.2928
No log 86.0 258 1.2826
No log 87.0 261 1.2915
No log 88.0 264 1.3069
No log 89.0 267 1.3006
No log 90.0 270 1.2940
No log 91.0 273 1.2902
No log 92.0 276 1.2833
No log 93.0 279 1.2741
No log 94.0 282 1.2840
No log 95.0 285 1.2960
No log 96.0 288 1.2978
No log 97.0 291 1.2957
No log 98.0 294 1.2948
No log 99.0 297 1.2935
No log 100.0 300 1.2932

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.7
  • Tokenizers 0.15.0
Downloads last month
2
Safetensors
Model size
108M params
Tensor type
F32