Edit model card

GQA_RoBERTa_German_legal_SQuAD_part_augmented_100

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8115

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 128
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 4 3.7779
No log 2.0 8 3.1298
No log 3.0 12 2.7509
No log 4.0 16 2.4195
No log 5.0 20 2.0781
No log 6.0 24 1.9657
No log 7.0 28 1.7131
No log 8.0 32 1.5142
No log 9.0 36 1.4133
No log 10.0 40 1.2182
No log 11.0 44 1.1278
No log 12.0 48 0.9984
No log 13.0 52 0.9583
No log 14.0 56 0.9222
No log 15.0 60 0.9156
No log 16.0 64 0.7979
No log 17.0 68 0.8205
No log 18.0 72 0.7654
No log 19.0 76 0.7767
No log 20.0 80 0.7633
No log 21.0 84 0.7075
No log 22.0 88 0.7287
No log 23.0 92 0.7088
No log 24.0 96 0.7165
No log 25.0 100 0.7375
No log 26.0 104 0.7581
No log 27.0 108 0.7481
No log 28.0 112 0.7394
No log 29.0 116 0.7362
No log 30.0 120 0.7300
No log 31.0 124 0.7306
No log 32.0 128 0.7348
No log 33.0 132 0.7495
No log 34.0 136 0.7526
No log 35.0 140 0.7432
No log 36.0 144 0.7492
No log 37.0 148 0.7356
No log 38.0 152 0.7347
No log 39.0 156 0.7415
No log 40.0 160 0.7401
No log 41.0 164 0.7340
No log 42.0 168 0.7388
No log 43.0 172 0.7358
No log 44.0 176 0.7471
No log 45.0 180 0.7642
No log 46.0 184 0.7823
No log 47.0 188 0.7659
No log 48.0 192 0.7476
No log 49.0 196 0.7545
No log 50.0 200 0.7568
No log 51.0 204 0.7658
No log 52.0 208 0.7750
No log 53.0 212 0.7738
No log 54.0 216 0.7714
No log 55.0 220 0.7765
No log 56.0 224 0.7865
No log 57.0 228 0.7902
No log 58.0 232 0.7816
No log 59.0 236 0.7863
No log 60.0 240 0.7992
No log 61.0 244 0.8242
No log 62.0 248 0.8399
No log 63.0 252 0.8415
No log 64.0 256 0.8285
No log 65.0 260 0.8209
No log 66.0 264 0.8182
No log 67.0 268 0.8241
No log 68.0 272 0.8260
No log 69.0 276 0.8195
No log 70.0 280 0.8186
No log 71.0 284 0.8180
No log 72.0 288 0.8138
No log 73.0 292 0.8066
No log 74.0 296 0.8007
No log 75.0 300 0.7992
No log 76.0 304 0.8054
No log 77.0 308 0.8121
No log 78.0 312 0.8173
No log 79.0 316 0.8279
No log 80.0 320 0.8365
No log 81.0 324 0.8280
No log 82.0 328 0.8165
No log 83.0 332 0.8094
No log 84.0 336 0.8064
No log 85.0 340 0.8037
No log 86.0 344 0.8060
No log 87.0 348 0.8084
No log 88.0 352 0.8112
No log 89.0 356 0.8121
No log 90.0 360 0.8155
No log 91.0 364 0.8201
No log 92.0 368 0.8253
No log 93.0 372 0.8252
No log 94.0 376 0.8227
No log 95.0 380 0.8195
No log 96.0 384 0.8156
No log 97.0 388 0.8132
No log 98.0 392 0.8125
No log 99.0 396 0.8117
No log 100.0 400 0.8115

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.7
  • Tokenizers 0.15.0
Downloads last month
13
Safetensors
Model size
124M params
Tensor type
F32