20230825024049

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6750
  • Accuracy: 0.7617

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.005
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 156 1.0743 0.4729
No log 2.0 312 0.6963 0.5271
No log 3.0 468 0.6584 0.5379
0.9697 4.0 624 0.8075 0.5379
0.9697 5.0 780 0.6045 0.6173
0.9697 6.0 936 0.5635 0.6462
0.8296 7.0 1092 0.8051 0.6354
0.8296 8.0 1248 0.5028 0.6787
0.8296 9.0 1404 0.5830 0.6570
0.7235 10.0 1560 0.5798 0.7004
0.7235 11.0 1716 0.8434 0.5054
0.7235 12.0 1872 0.7164 0.6570
0.6566 13.0 2028 0.5957 0.7112
0.6566 14.0 2184 0.4893 0.7617
0.6566 15.0 2340 0.5230 0.6751
0.6566 16.0 2496 0.7581 0.6282
0.6156 17.0 2652 0.5233 0.7437
0.6156 18.0 2808 0.8169 0.5993
0.6156 19.0 2964 0.5691 0.7581
0.5597 20.0 3120 0.5216 0.6895
0.5597 21.0 3276 0.5625 0.7256
0.5597 22.0 3432 0.6847 0.6895
0.518 23.0 3588 0.4864 0.7473
0.518 24.0 3744 0.5535 0.7617
0.518 25.0 3900 0.7351 0.6931
0.4661 26.0 4056 0.5020 0.7545
0.4661 27.0 4212 0.5132 0.7581
0.4661 28.0 4368 0.7423 0.7040
0.396 29.0 4524 0.4947 0.7545
0.396 30.0 4680 0.6220 0.7437
0.396 31.0 4836 0.6123 0.7437
0.396 32.0 4992 0.5141 0.7617
0.3842 33.0 5148 0.6979 0.7220
0.3842 34.0 5304 0.5813 0.7653
0.3842 35.0 5460 0.5639 0.7545
0.3473 36.0 5616 0.6147 0.7401
0.3473 37.0 5772 0.7640 0.7184
0.3473 38.0 5928 0.7093 0.7509
0.3189 39.0 6084 0.5635 0.7509
0.3189 40.0 6240 0.6134 0.7473
0.3189 41.0 6396 0.6238 0.7437
0.2882 42.0 6552 0.6768 0.7653
0.2882 43.0 6708 0.6504 0.7581
0.2882 44.0 6864 0.6762 0.7401
0.2758 45.0 7020 0.7442 0.7726
0.2758 46.0 7176 0.7323 0.7292
0.2758 47.0 7332 0.6010 0.7509
0.2758 48.0 7488 0.6571 0.7437
0.2347 49.0 7644 0.6066 0.7617
0.2347 50.0 7800 0.6876 0.7473
0.2347 51.0 7956 0.5945 0.7762
0.2343 52.0 8112 0.7166 0.7653
0.2343 53.0 8268 0.7535 0.7509
0.2343 54.0 8424 0.6777 0.7690
0.2107 55.0 8580 0.5962 0.7545
0.2107 56.0 8736 0.6697 0.7509
0.2107 57.0 8892 0.6426 0.7545
0.2081 58.0 9048 0.6783 0.7365
0.2081 59.0 9204 0.9118 0.7401
0.2081 60.0 9360 0.6387 0.7653
0.1895 61.0 9516 0.7557 0.7509
0.1895 62.0 9672 0.7595 0.7401
0.1895 63.0 9828 0.6978 0.7437
0.1895 64.0 9984 0.6016 0.7617
0.1873 65.0 10140 0.6893 0.7401
0.1873 66.0 10296 0.7575 0.7256
0.1873 67.0 10452 0.6249 0.7617
0.177 68.0 10608 0.6406 0.7509
0.177 69.0 10764 0.6802 0.7617
0.177 70.0 10920 0.7479 0.7329
0.1645 71.0 11076 0.7513 0.7437
0.1645 72.0 11232 0.6490 0.7762
0.1645 73.0 11388 0.7052 0.7256
0.1584 74.0 11544 0.6589 0.7726
0.1584 75.0 11700 0.6695 0.7473
0.1584 76.0 11856 0.6239 0.7690
0.1554 77.0 12012 0.6807 0.7473
0.1554 78.0 12168 0.6740 0.7509
0.1554 79.0 12324 0.6912 0.7473
0.1554 80.0 12480 0.6750 0.7617

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
18
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train dkqjrm/20230825024049