Edit model card

GQA_BERT_German_legal_SQuAD_100

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9936

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 160
  • eval_batch_size: 40
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 2 5.4152
No log 2.0 4 4.3849
No log 3.0 6 3.8582
No log 4.0 8 3.4297
No log 5.0 10 3.0349
No log 6.0 12 2.6763
No log 7.0 14 2.3453
No log 8.0 16 2.0796
No log 9.0 18 1.8515
No log 10.0 20 1.6481
No log 11.0 22 1.4847
No log 12.0 24 1.3317
No log 13.0 26 1.1982
No log 14.0 28 1.1084
No log 15.0 30 1.0411
No log 16.0 32 0.9926
No log 17.0 34 0.9496
No log 18.0 36 0.9261
No log 19.0 38 0.9167
No log 20.0 40 0.8978
No log 21.0 42 0.8942
No log 22.0 44 0.9057
No log 23.0 46 0.9261
No log 24.0 48 0.9390
No log 25.0 50 0.9377
No log 26.0 52 0.9255
No log 27.0 54 0.9194
No log 28.0 56 0.9235
No log 29.0 58 0.9354
No log 30.0 60 0.9502
No log 31.0 62 0.9532
No log 32.0 64 0.9580
No log 33.0 66 0.9698
No log 34.0 68 0.9576
No log 35.0 70 0.9607
No log 36.0 72 0.9718
No log 37.0 74 0.9858
No log 38.0 76 1.0113
No log 39.0 78 1.0328
No log 40.0 80 1.0448
No log 41.0 82 1.0389
No log 42.0 84 1.0255
No log 43.0 86 1.0157
No log 44.0 88 1.0172
No log 45.0 90 1.0177
No log 46.0 92 1.0207
No log 47.0 94 1.0248
No log 48.0 96 1.0149
No log 49.0 98 0.9964
No log 50.0 100 0.9910
No log 51.0 102 0.9872
No log 52.0 104 0.9769
No log 53.0 106 0.9786
No log 54.0 108 0.9850
No log 55.0 110 1.0077
No log 56.0 112 1.0295
No log 57.0 114 1.0328
No log 58.0 116 1.0229
No log 59.0 118 0.9974
No log 60.0 120 0.9801
No log 61.0 122 0.9654
No log 62.0 124 0.9663
No log 63.0 126 0.9543
No log 64.0 128 0.9477
No log 65.0 130 0.9456
No log 66.0 132 0.9539
No log 67.0 134 0.9653
No log 68.0 136 0.9822
No log 69.0 138 1.0056
No log 70.0 140 1.0410
No log 71.0 142 1.0599
No log 72.0 144 1.0630
No log 73.0 146 1.0606
No log 74.0 148 1.0508
No log 75.0 150 1.0367
No log 76.0 152 1.0172
No log 77.0 154 1.0042
No log 78.0 156 0.9934
No log 79.0 158 0.9842
No log 80.0 160 0.9839
No log 81.0 162 0.9835
No log 82.0 164 0.9803
No log 83.0 166 0.9792
No log 84.0 168 0.9843
No log 85.0 170 0.9878
No log 86.0 172 0.9900
No log 87.0 174 0.9915
No log 88.0 176 0.9938
No log 89.0 178 0.9939
No log 90.0 180 0.9931
No log 91.0 182 0.9928
No log 92.0 184 0.9937
No log 93.0 186 0.9935
No log 94.0 188 0.9936
No log 95.0 190 0.9933
No log 96.0 192 0.9928
No log 97.0 194 0.9928
No log 98.0 196 0.9934
No log 99.0 198 0.9937
No log 100.0 200 0.9936

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.7
  • Tokenizers 0.15.0
Downloads last month
15
Safetensors
Model size
108M params
Tensor type
F32