Edit model card

20230903172232

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5325
  • Accuracy: 0.6552

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 340 0.4929 0.5
0.5083 2.0 680 0.4935 0.5
0.5024 3.0 1020 0.4979 0.5
0.5024 4.0 1360 0.4791 0.5
0.4909 5.0 1700 0.4873 0.5470
0.4734 6.0 2040 0.4883 0.5345
0.4734 7.0 2380 0.4729 0.5502
0.4616 8.0 2720 0.4693 0.4922
0.446 9.0 3060 0.4777 0.5549
0.446 10.0 3400 0.5197 0.6348
0.444 11.0 3740 0.4987 0.6207
0.4366 12.0 4080 0.4764 0.5846
0.4366 13.0 4420 0.4807 0.5705
0.4257 14.0 4760 0.5061 0.6332
0.4205 15.0 5100 0.4879 0.5204
0.4205 16.0 5440 0.5076 0.6301
0.419 17.0 5780 0.4885 0.5909
0.4103 18.0 6120 0.5273 0.6583
0.4103 19.0 6460 0.4833 0.5423
0.4107 20.0 6800 0.5060 0.5784
0.4015 21.0 7140 0.5064 0.6489
0.4015 22.0 7480 0.4873 0.5298
0.4032 23.0 7820 0.5016 0.6458
0.3949 24.0 8160 0.4993 0.6301
0.3961 25.0 8500 0.4975 0.6113
0.3961 26.0 8840 0.4924 0.5674
0.3917 27.0 9180 0.5187 0.6708
0.3894 28.0 9520 0.4951 0.5909
0.3894 29.0 9860 0.5029 0.6113
0.3867 30.0 10200 0.5276 0.6677
0.3842 31.0 10540 0.5023 0.6285
0.3842 32.0 10880 0.5175 0.6599
0.3845 33.0 11220 0.5094 0.6348
0.3798 34.0 11560 0.5120 0.6411
0.3798 35.0 11900 0.5237 0.6646
0.3799 36.0 12240 0.5030 0.5737
0.3807 37.0 12580 0.5234 0.6520
0.3807 38.0 12920 0.5183 0.6536
0.373 39.0 13260 0.5078 0.6034
0.375 40.0 13600 0.5172 0.6536
0.375 41.0 13940 0.5164 0.6505
0.3738 42.0 14280 0.5180 0.6332
0.369 43.0 14620 0.5145 0.6301
0.369 44.0 14960 0.5153 0.6223
0.3722 45.0 15300 0.5289 0.6818
0.3685 46.0 15640 0.5203 0.6567
0.3685 47.0 15980 0.5210 0.6285
0.3688 48.0 16320 0.5113 0.6144
0.3661 49.0 16660 0.5097 0.5439
0.3657 50.0 17000 0.5166 0.6536
0.3657 51.0 17340 0.5208 0.6552
0.3656 52.0 17680 0.5249 0.6646
0.3643 53.0 18020 0.5056 0.5940
0.3643 54.0 18360 0.5122 0.6583
0.3611 55.0 18700 0.5247 0.6395
0.3629 56.0 19040 0.5301 0.6599
0.3629 57.0 19380 0.5284 0.6473
0.3597 58.0 19720 0.5316 0.6473
0.361 59.0 20060 0.5315 0.6552
0.361 60.0 20400 0.5424 0.6567
0.3587 61.0 20740 0.5338 0.6442
0.3557 62.0 21080 0.5283 0.6285
0.3557 63.0 21420 0.5287 0.6599
0.3556 64.0 21760 0.5307 0.6426
0.3578 65.0 22100 0.5326 0.6489
0.3578 66.0 22440 0.5207 0.5784
0.3504 67.0 22780 0.5271 0.6348
0.3588 68.0 23120 0.5338 0.6489
0.3588 69.0 23460 0.5386 0.6583
0.3553 70.0 23800 0.5308 0.6567
0.3511 71.0 24140 0.5325 0.6473
0.3511 72.0 24480 0.5403 0.6614
0.3522 73.0 24820 0.5319 0.6379
0.3534 74.0 25160 0.5332 0.6505
0.3495 75.0 25500 0.5343 0.6505
0.3495 76.0 25840 0.5312 0.6567
0.3535 77.0 26180 0.5356 0.6505
0.3491 78.0 26520 0.5342 0.6536
0.3491 79.0 26860 0.5327 0.6552
0.3518 80.0 27200 0.5325 0.6552

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
11

Dataset used to train dkqjrm/20230903172232