Edit model card

20230831092825

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5298
  • Accuracy: 0.6771

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 340 0.4989 0.5
0.5076 2.0 680 0.4922 0.5
0.5029 3.0 1020 0.4980 0.5
0.5029 4.0 1360 0.4881 0.5125
0.4992 5.0 1700 0.5067 0.5
0.4818 6.0 2040 0.4919 0.5251
0.4818 7.0 2380 0.5045 0.5392
0.4719 8.0 2720 0.4695 0.5
0.4636 9.0 3060 0.4805 0.5
0.4636 10.0 3400 0.5002 0.5
0.4501 11.0 3740 0.5665 0.6646
0.4418 12.0 4080 0.5283 0.6897
0.4418 13.0 4420 0.4705 0.5
0.4352 14.0 4760 0.5644 0.6630
0.4302 15.0 5100 0.5080 0.6505
0.4302 16.0 5440 0.5084 0.6897
0.4305 17.0 5780 0.5006 0.6599
0.4203 18.0 6120 0.5246 0.6928
0.4203 19.0 6460 0.4958 0.6583
0.4166 20.0 6800 0.5595 0.6630
0.4117 21.0 7140 0.4796 0.5
0.4117 22.0 7480 0.4820 0.5
0.4131 23.0 7820 0.5158 0.6755
0.406 24.0 8160 0.4801 0.5
0.4062 25.0 8500 0.5471 0.6646
0.4062 26.0 8840 0.4904 0.5
0.4021 27.0 9180 0.4880 0.5
0.3971 28.0 9520 0.5019 0.6646
0.3971 29.0 9860 0.4825 0.5
0.3936 30.0 10200 0.5069 0.6693
0.3907 31.0 10540 0.5472 0.6693
0.3907 32.0 10880 0.4886 0.5
0.3906 33.0 11220 0.5531 0.6693
0.3888 34.0 11560 0.5023 0.5266
0.3888 35.0 11900 0.4896 0.5
0.387 36.0 12240 0.4985 0.5
0.3836 37.0 12580 0.5309 0.6834
0.3836 38.0 12920 0.5402 0.6818
0.3792 39.0 13260 0.4854 0.5
0.3789 40.0 13600 0.4971 0.5
0.3789 41.0 13940 0.5368 0.6803
0.3775 42.0 14280 0.4958 0.5047
0.3753 43.0 14620 0.5139 0.6897
0.3753 44.0 14960 0.5224 0.6834
0.3795 45.0 15300 0.5119 0.6865
0.3743 46.0 15640 0.5120 0.6740
0.3743 47.0 15980 0.5049 0.5204
0.3726 48.0 16320 0.5026 0.5
0.3683 49.0 16660 0.5137 0.6646
0.3707 50.0 17000 0.5088 0.6129
0.3707 51.0 17340 0.5608 0.6646
0.3654 52.0 17680 0.5217 0.6803
0.3684 53.0 18020 0.5236 0.6740
0.3684 54.0 18360 0.5135 0.5016
0.3663 55.0 18700 0.5192 0.6818
0.3669 56.0 19040 0.5212 0.6160
0.3669 57.0 19380 0.5320 0.6740
0.3641 58.0 19720 0.5344 0.6646
0.3628 59.0 20060 0.4991 0.5
0.3628 60.0 20400 0.5341 0.6661
0.3612 61.0 20740 0.5039 0.5
0.3608 62.0 21080 0.5267 0.6379
0.3608 63.0 21420 0.5249 0.6364
0.3599 64.0 21760 0.5226 0.6599
0.3616 65.0 22100 0.5370 0.6834
0.3616 66.0 22440 0.5109 0.5
0.3543 67.0 22780 0.5368 0.6740
0.3616 68.0 23120 0.5236 0.5690
0.3616 69.0 23460 0.5300 0.6693
0.3578 70.0 23800 0.5441 0.6583
0.3541 71.0 24140 0.5310 0.6724
0.3541 72.0 24480 0.5346 0.6693
0.354 73.0 24820 0.5338 0.6630
0.355 74.0 25160 0.5279 0.6599
0.3536 75.0 25500 0.5280 0.6552
0.3536 76.0 25840 0.5328 0.6693
0.3539 77.0 26180 0.5231 0.5376
0.3527 78.0 26520 0.5282 0.6646
0.3527 79.0 26860 0.5250 0.6364
0.3535 80.0 27200 0.5298 0.6771

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
9

Dataset used to train dkqjrm/20230831092825