Edit model card

20230831185034

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6514
  • Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 340 0.6328 0.5
0.6373 2.0 680 0.6187 0.5
0.6335 3.0 1020 0.6199 0.5
0.6335 4.0 1360 0.6402 0.5125
0.6141 5.0 1700 0.6342 0.5
0.6035 6.0 2040 0.6137 0.5
0.6035 7.0 2380 0.6125 0.5
0.599 8.0 2720 0.6656 0.5
0.6086 9.0 3060 0.6465 0.5
0.6086 10.0 3400 0.6109 0.5
0.5913 11.0 3740 0.6273 0.5
0.5775 12.0 4080 0.6811 0.5470
0.5775 13.0 4420 0.6180 0.5
0.5687 14.0 4760 0.6692 0.5
0.5668 15.0 5100 0.6105 0.5
0.5668 16.0 5440 0.6322 0.5
0.5643 17.0 5780 0.6456 0.5313
0.5529 18.0 6120 0.6209 0.5
0.5529 19.0 6460 0.6351 0.5
0.5533 20.0 6800 0.6468 0.5
0.5494 21.0 7140 0.6303 0.5
0.5494 22.0 7480 0.6105 0.5
0.5514 23.0 7820 0.6282 0.5
0.54 24.0 8160 0.6196 0.5
0.541 25.0 8500 0.6838 0.5
0.541 26.0 8840 0.6171 0.5
0.5378 27.0 9180 0.6537 0.5
0.5344 28.0 9520 0.6543 0.5
0.5344 29.0 9860 0.6515 0.5
0.5303 30.0 10200 0.6314 0.5
0.5284 31.0 10540 0.6371 0.5
0.5284 32.0 10880 0.6739 0.5
0.5252 33.0 11220 0.6632 0.5
0.5232 34.0 11560 0.6564 0.5
0.5232 35.0 11900 0.6271 0.5
0.521 36.0 12240 0.6306 0.5
0.5215 37.0 12580 0.6324 0.5
0.5215 38.0 12920 0.7030 0.4984
0.5177 39.0 13260 0.6432 0.5
0.5136 40.0 13600 0.6151 0.5
0.5136 41.0 13940 0.6601 0.5
0.5153 42.0 14280 0.6176 0.5
0.5114 43.0 14620 0.6579 0.5
0.5114 44.0 14960 0.6584 0.5
0.514 45.0 15300 0.6408 0.5
0.5093 46.0 15640 0.6490 0.5
0.5093 47.0 15980 0.6457 0.5
0.5105 48.0 16320 0.6642 0.5
0.5067 49.0 16660 0.6358 0.5
0.5069 50.0 17000 0.6318 0.5
0.5069 51.0 17340 0.6718 0.5
0.5059 52.0 17680 0.6658 0.5
0.5064 53.0 18020 0.6414 0.5
0.5064 54.0 18360 0.6285 0.5
0.5034 55.0 18700 0.6793 0.5
0.4992 56.0 19040 0.6790 0.5
0.4992 57.0 19380 0.6370 0.5
0.5013 58.0 19720 0.6795 0.5
0.5015 59.0 20060 0.6312 0.5
0.5015 60.0 20400 0.6487 0.5
0.4984 61.0 20740 0.6539 0.5
0.499 62.0 21080 0.6254 0.5
0.499 63.0 21420 0.6403 0.5
0.4977 64.0 21760 0.6619 0.5
0.4992 65.0 22100 0.6459 0.5
0.4992 66.0 22440 0.6428 0.5
0.4899 67.0 22780 0.6488 0.5
0.5006 68.0 23120 0.6486 0.5
0.5006 69.0 23460 0.6512 0.5
0.4971 70.0 23800 0.6509 0.5
0.496 71.0 24140 0.6758 0.5
0.496 72.0 24480 0.6587 0.5
0.49 73.0 24820 0.6529 0.5
0.4939 74.0 25160 0.6659 0.5
0.492 75.0 25500 0.6504 0.5
0.492 76.0 25840 0.6531 0.5
0.4934 77.0 26180 0.6529 0.5
0.491 78.0 26520 0.6498 0.5
0.491 79.0 26860 0.6515 0.5
0.4895 80.0 27200 0.6514 0.5

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
9

Dataset used to train dkqjrm/20230831185034