Edit model card

Legal_GQA_7_BERT_augmented_100

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 4.8854

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 128
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 4 2.9375
No log 2.0 8 2.5861
No log 3.0 12 3.0236
No log 4.0 16 2.3145
No log 5.0 20 2.7110
No log 6.0 24 2.4009
No log 7.0 28 2.6089
No log 8.0 32 2.5080
No log 9.0 36 2.6943
No log 10.0 40 2.6713
No log 11.0 44 3.0227
No log 12.0 48 2.8381
No log 13.0 52 3.2355
No log 14.0 56 2.9510
No log 15.0 60 3.3167
No log 16.0 64 3.2990
No log 17.0 68 3.4914
No log 18.0 72 3.5478
No log 19.0 76 3.7819
No log 20.0 80 3.7423
No log 21.0 84 3.7653
No log 22.0 88 3.9264
No log 23.0 92 3.7901
No log 24.0 96 4.0258
No log 25.0 100 4.1388
No log 26.0 104 4.1338
No log 27.0 108 4.0925
No log 28.0 112 4.0685
No log 29.0 116 4.2066
No log 30.0 120 4.3976
No log 31.0 124 4.2297
No log 32.0 128 4.4429
No log 33.0 132 4.4769
No log 34.0 136 4.6924
No log 35.0 140 4.5341
No log 36.0 144 4.4352
No log 37.0 148 4.4956
No log 38.0 152 4.5124
No log 39.0 156 4.4433
No log 40.0 160 4.5376
No log 41.0 164 4.4187
No log 42.0 168 4.6840
No log 43.0 172 4.8962
No log 44.0 176 4.6352
No log 45.0 180 4.6857
No log 46.0 184 4.7973
No log 47.0 188 4.8357
No log 48.0 192 4.8215
No log 49.0 196 4.8593
No log 50.0 200 4.7425
No log 51.0 204 4.6979
No log 52.0 208 4.7642
No log 53.0 212 4.9259
No log 54.0 216 5.0124
No log 55.0 220 5.1167
No log 56.0 224 5.0260
No log 57.0 228 4.8341
No log 58.0 232 4.8657
No log 59.0 236 4.8196
No log 60.0 240 4.7984
No log 61.0 244 5.0060
No log 62.0 248 4.9326
No log 63.0 252 4.7038
No log 64.0 256 4.7326
No log 65.0 260 5.0008
No log 66.0 264 5.1227
No log 67.0 268 4.8750
No log 68.0 272 4.6740
No log 69.0 276 4.9472
No log 70.0 280 5.0634
No log 71.0 284 4.9791
No log 72.0 288 4.9960
No log 73.0 292 4.9437
No log 74.0 296 4.8558
No log 75.0 300 4.8548
No log 76.0 304 4.9371
No log 77.0 308 4.8281
No log 78.0 312 4.8555
No log 79.0 316 5.0903
No log 80.0 320 5.1344
No log 81.0 324 5.0305
No log 82.0 328 4.9848
No log 83.0 332 4.9658
No log 84.0 336 4.8907
No log 85.0 340 4.8319
No log 86.0 344 4.8355
No log 87.0 348 4.8083
No log 88.0 352 4.8290
No log 89.0 356 4.9148
No log 90.0 360 4.9964
No log 91.0 364 5.0250
No log 92.0 368 4.9765
No log 93.0 372 4.9332
No log 94.0 376 4.9085
No log 95.0 380 4.8835
No log 96.0 384 4.8701
No log 97.0 388 4.8764
No log 98.0 392 4.8855
No log 99.0 396 4.8869
No log 100.0 400 4.8854

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.7
  • Tokenizers 0.15.0
Downloads last month
2
Safetensors
Model size
108M params
Tensor type
F32