Edit model card

20230903230355

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6499
  • Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 340 0.6198 0.5
0.6345 2.0 680 0.6217 0.5
0.6271 3.0 1020 0.6081 0.5
0.6271 4.0 1360 0.6146 0.5
0.6166 5.0 1700 0.6180 0.5
0.619 6.0 2040 0.6220 0.5
0.619 7.0 2380 0.6023 0.5
0.605 8.0 2720 0.5987 0.5
0.5863 9.0 3060 0.6086 0.5016
0.5863 10.0 3400 0.6292 0.5047
0.5789 11.0 3740 0.6150 0.5016
0.5716 12.0 4080 0.5969 0.5
0.5716 13.0 4420 0.6045 0.5
0.5599 14.0 4760 0.6281 0.4969
0.5555 15.0 5100 0.6021 0.5
0.5555 16.0 5440 0.6161 0.5
0.553 17.0 5780 0.6050 0.5
0.5412 18.0 6120 0.6483 0.4984
0.5412 19.0 6460 0.6169 0.5
0.5403 20.0 6800 0.6287 0.5
0.5349 21.0 7140 0.6369 0.5
0.5349 22.0 7480 0.6163 0.5
0.5341 23.0 7820 0.6180 0.4984
0.5264 24.0 8160 0.6171 0.5
0.5265 25.0 8500 0.6289 0.5
0.5265 26.0 8840 0.6161 0.5
0.5218 27.0 9180 0.6542 0.4984
0.5204 28.0 9520 0.6246 0.5
0.5204 29.0 9860 0.6192 0.5
0.5164 30.0 10200 0.6213 0.5
0.5136 31.0 10540 0.6256 0.5
0.5136 32.0 10880 0.6605 0.5
0.5113 33.0 11220 0.6310 0.5
0.5101 34.0 11560 0.6348 0.5
0.5101 35.0 11900 0.6392 0.5
0.5095 36.0 12240 0.6291 0.5
0.5058 37.0 12580 0.6399 0.5
0.5058 38.0 12920 0.6546 0.5
0.5022 39.0 13260 0.6294 0.5
0.5009 40.0 13600 0.6348 0.5
0.5009 41.0 13940 0.6261 0.5
0.5005 42.0 14280 0.6442 0.5
0.4952 43.0 14620 0.6338 0.5
0.4952 44.0 14960 0.6358 0.5
0.5019 45.0 15300 0.6387 0.5
0.4968 46.0 15640 0.6383 0.5
0.4968 47.0 15980 0.6361 0.5
0.4972 48.0 16320 0.6428 0.4984
0.4947 49.0 16660 0.6308 0.5
0.4958 50.0 17000 0.6443 0.5
0.4958 51.0 17340 0.6520 0.5
0.4926 52.0 17680 0.6491 0.5
0.4942 53.0 18020 0.6400 0.5
0.4942 54.0 18360 0.6373 0.5
0.4895 55.0 18700 0.6579 0.5
0.4908 56.0 19040 0.6611 0.5
0.4908 57.0 19380 0.6474 0.5
0.4916 58.0 19720 0.6537 0.5
0.492 59.0 20060 0.6507 0.5
0.492 60.0 20400 0.6582 0.5
0.4855 61.0 20740 0.6578 0.5
0.4874 62.0 21080 0.6498 0.5
0.4874 63.0 21420 0.6445 0.5
0.485 64.0 21760 0.6470 0.5
0.4889 65.0 22100 0.6483 0.5
0.4889 66.0 22440 0.6412 0.5
0.4778 67.0 22780 0.6437 0.5
0.4862 68.0 23120 0.6509 0.5
0.4862 69.0 23460 0.6491 0.5
0.4834 70.0 23800 0.6485 0.5
0.4802 71.0 24140 0.6444 0.5
0.4802 72.0 24480 0.6460 0.5
0.4818 73.0 24820 0.6500 0.5
0.4815 74.0 25160 0.6549 0.5
0.4804 75.0 25500 0.6577 0.5
0.4804 76.0 25840 0.6533 0.5
0.4812 77.0 26180 0.6516 0.5
0.4801 78.0 26520 0.6513 0.5
0.4801 79.0 26860 0.6519 0.5
0.48 80.0 27200 0.6499 0.5

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
10

Dataset used to train dkqjrm/20230903230355