Edit model card

20230824024310

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3005
  • Accuracy: 0.7509

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.003
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 60.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 312 0.6323 0.5307
0.5669 2.0 624 0.4749 0.5415
0.5669 3.0 936 0.4812 0.5271
0.5542 4.0 1248 0.3917 0.5704
0.5146 5.0 1560 0.4706 0.5523
0.5146 6.0 1872 0.4418 0.6173
0.464 7.0 2184 0.3863 0.6462
0.464 8.0 2496 0.3326 0.6751
0.4357 9.0 2808 0.3896 0.6065
0.4268 10.0 3120 0.3329 0.6823
0.4268 11.0 3432 0.4012 0.6679
0.4077 12.0 3744 0.3661 0.7112
0.3832 13.0 4056 0.3640 0.7112
0.3832 14.0 4368 0.3328 0.7040
0.3918 15.0 4680 0.3398 0.7076
0.3918 16.0 4992 0.6806 0.6282
0.3741 17.0 5304 0.4620 0.6498
0.3627 18.0 5616 0.3085 0.7473
0.3627 19.0 5928 0.3018 0.7256
0.3392 20.0 6240 0.3790 0.6534
0.3074 21.0 6552 0.2964 0.7401
0.3074 22.0 6864 0.3124 0.7401
0.3076 23.0 7176 0.3907 0.6931
0.3076 24.0 7488 0.3046 0.7329
0.2868 25.0 7800 0.3494 0.7365
0.2757 26.0 8112 0.3811 0.7148
0.2757 27.0 8424 0.3061 0.7509
0.2688 28.0 8736 0.2989 0.7401
0.2638 29.0 9048 0.3090 0.7365
0.2638 30.0 9360 0.3295 0.7365
0.2554 31.0 9672 0.3185 0.7401
0.2554 32.0 9984 0.2872 0.7401
0.2538 33.0 10296 0.3178 0.7509
0.2404 34.0 10608 0.2920 0.7473
0.2404 35.0 10920 0.3001 0.7329
0.2342 36.0 11232 0.3155 0.7437
0.2258 37.0 11544 0.3324 0.7437
0.2258 38.0 11856 0.3179 0.7437
0.2247 39.0 12168 0.3276 0.7509
0.2247 40.0 12480 0.2988 0.7401
0.2184 41.0 12792 0.2916 0.7329
0.215 42.0 13104 0.3033 0.7401
0.215 43.0 13416 0.3209 0.7473
0.2117 44.0 13728 0.2994 0.7473
0.2035 45.0 14040 0.3093 0.7473
0.2035 46.0 14352 0.2984 0.7365
0.203 47.0 14664 0.2866 0.7401
0.203 48.0 14976 0.3140 0.7473
0.2019 49.0 15288 0.3158 0.7509
0.1937 50.0 15600 0.2996 0.7545
0.1937 51.0 15912 0.2814 0.7473
0.1988 52.0 16224 0.3050 0.7437
0.1965 53.0 16536 0.3073 0.7473
0.1965 54.0 16848 0.2994 0.7509
0.1918 55.0 17160 0.2985 0.7509
0.1918 56.0 17472 0.3046 0.7509
0.1902 57.0 17784 0.2991 0.7473
0.1879 58.0 18096 0.2942 0.7509
0.1879 59.0 18408 0.2976 0.7509
0.194 60.0 18720 0.3005 0.7509

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
4

Dataset used to train dkqjrm/20230824024310