Edit model card

20230825183837

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5509
  • Accuracy: 0.7401

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.005
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 156 1.0350 0.5307
No log 2.0 312 0.7083 0.5199
No log 3.0 468 0.8268 0.4801
0.9653 4.0 624 0.7385 0.5199
0.9653 5.0 780 0.6701 0.5271
0.9653 6.0 936 0.6090 0.6029
0.8296 7.0 1092 0.5400 0.6282
0.8296 8.0 1248 0.5084 0.6715
0.8296 9.0 1404 0.5534 0.6606
0.7744 10.0 1560 0.4802 0.6895
0.7744 11.0 1716 0.5757 0.6715
0.7744 12.0 1872 0.5599 0.6787
0.6735 13.0 2028 0.4614 0.7220
0.6735 14.0 2184 0.4656 0.7004
0.6735 15.0 2340 0.5463 0.6859
0.6735 16.0 2496 0.5148 0.6968
0.642 17.0 2652 0.4414 0.7292
0.642 18.0 2808 0.6131 0.6931
0.642 19.0 2964 0.4674 0.7184
0.6495 20.0 3120 0.5114 0.7004
0.6495 21.0 3276 0.4827 0.7365
0.6495 22.0 3432 0.7846 0.6245
0.5629 23.0 3588 0.4956 0.7148
0.5629 24.0 3744 0.4705 0.7617
0.5629 25.0 3900 0.4782 0.7220
0.5208 26.0 4056 0.4177 0.7365
0.5208 27.0 4212 0.6597 0.6931
0.5208 28.0 4368 0.5945 0.6931
0.5051 29.0 4524 0.5733 0.7184
0.5051 30.0 4680 0.4994 0.7437
0.5051 31.0 4836 0.5630 0.6895
0.5051 32.0 4992 0.5061 0.7437
0.4822 33.0 5148 0.5961 0.6968
0.4822 34.0 5304 0.5072 0.7329
0.4822 35.0 5460 0.5716 0.7473
0.4437 36.0 5616 0.5670 0.7076
0.4437 37.0 5772 0.5414 0.7112
0.4437 38.0 5928 0.5748 0.6931
0.436 39.0 6084 0.5068 0.7545
0.436 40.0 6240 0.5532 0.7076
0.436 41.0 6396 0.5705 0.7545
0.3882 42.0 6552 0.5622 0.7545
0.3882 43.0 6708 0.5511 0.7112
0.3882 44.0 6864 0.5306 0.7473
0.3639 45.0 7020 0.5418 0.7148
0.3639 46.0 7176 0.5856 0.7256
0.3639 47.0 7332 0.5920 0.7581
0.3639 48.0 7488 0.6323 0.7112
0.3344 49.0 7644 0.5837 0.7256
0.3344 50.0 7800 0.5591 0.7329
0.3344 51.0 7956 0.6241 0.7401
0.3131 52.0 8112 0.5855 0.7365
0.3131 53.0 8268 0.5593 0.7401
0.3131 54.0 8424 0.5920 0.7401
0.319 55.0 8580 0.5000 0.7401
0.319 56.0 8736 0.6601 0.7004
0.319 57.0 8892 0.7536 0.7076
0.2995 58.0 9048 0.5308 0.7256
0.2995 59.0 9204 0.7136 0.7365
0.2995 60.0 9360 0.5192 0.7581
0.2865 61.0 9516 0.5491 0.7365
0.2865 62.0 9672 0.5884 0.7292
0.2865 63.0 9828 0.5730 0.7329
0.2865 64.0 9984 0.5539 0.7365
0.2779 65.0 10140 0.5626 0.7401
0.2779 66.0 10296 0.5826 0.7545
0.2779 67.0 10452 0.6070 0.7473
0.2621 68.0 10608 0.5399 0.7509
0.2621 69.0 10764 0.5598 0.7437
0.2621 70.0 10920 0.5688 0.7401
0.2549 71.0 11076 0.5407 0.7437
0.2549 72.0 11232 0.5516 0.7473
0.2549 73.0 11388 0.5699 0.7148
0.2453 74.0 11544 0.5284 0.7437
0.2453 75.0 11700 0.5615 0.7401
0.2453 76.0 11856 0.5336 0.7365
0.2478 77.0 12012 0.5502 0.7401
0.2478 78.0 12168 0.5507 0.7401
0.2478 79.0 12324 0.5451 0.7401
0.2478 80.0 12480 0.5509 0.7401

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
11

Dataset used to train dkqjrm/20230825183837