Edit model card

20230822124929

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3407
  • Accuracy: 0.6570

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 60.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 312 0.3734 0.5307
0.4216 2.0 624 0.3802 0.4729
0.4216 3.0 936 0.4299 0.4765
0.3883 4.0 1248 0.3490 0.5451
0.3918 5.0 1560 0.3461 0.5884
0.3918 6.0 1872 0.3599 0.5523
0.3764 7.0 2184 0.3565 0.5451
0.3764 8.0 2496 0.3611 0.5018
0.3794 9.0 2808 0.4040 0.5415
0.3778 10.0 3120 0.3622 0.4729
0.3778 11.0 3432 0.4954 0.4693
0.3813 12.0 3744 0.3602 0.4765
0.3718 13.0 4056 0.3453 0.5415
0.3718 14.0 4368 0.3640 0.5343
0.3701 15.0 4680 0.3589 0.4838
0.3701 16.0 4992 0.3700 0.5632
0.371 17.0 5304 0.4147 0.5343
0.3644 18.0 5616 0.3505 0.5740
0.3644 19.0 5928 0.3736 0.4874
0.3667 20.0 6240 0.3637 0.5704
0.3629 21.0 6552 0.3412 0.6209
0.3629 22.0 6864 0.3451 0.6282
0.3574 23.0 7176 0.3626 0.6065
0.3574 24.0 7488 0.3732 0.4874
0.3565 25.0 7800 0.3427 0.6173
0.3525 26.0 8112 0.3855 0.5812
0.3525 27.0 8424 0.3384 0.6498
0.3523 28.0 8736 0.3408 0.6282
0.3505 29.0 9048 0.3548 0.6101
0.3505 30.0 9360 0.3861 0.5921
0.3509 31.0 9672 0.3710 0.5993
0.3509 32.0 9984 0.3897 0.5993
0.3494 33.0 10296 0.3535 0.6354
0.3459 34.0 10608 0.3389 0.6282
0.3459 35.0 10920 0.3397 0.6209
0.3429 36.0 11232 0.3450 0.6101
0.3432 37.0 11544 0.3925 0.6065
0.3432 38.0 11856 0.3294 0.6715
0.341 39.0 12168 0.3442 0.6390
0.341 40.0 12480 0.3421 0.6462
0.3392 41.0 12792 0.3371 0.6390
0.3392 42.0 13104 0.3326 0.6534
0.3392 43.0 13416 0.3714 0.6282
0.337 44.0 13728 0.3535 0.6245
0.3352 45.0 14040 0.3548 0.6245
0.3352 46.0 14352 0.3361 0.6570
0.3335 47.0 14664 0.3329 0.6859
0.3335 48.0 14976 0.3423 0.6462
0.3329 49.0 15288 0.3356 0.6534
0.3308 50.0 15600 0.3398 0.6643
0.3308 51.0 15912 0.3374 0.6679
0.3291 52.0 16224 0.3315 0.6787
0.3284 53.0 16536 0.3650 0.6318
0.3284 54.0 16848 0.3537 0.6282
0.3257 55.0 17160 0.3480 0.6426
0.3257 56.0 17472 0.3424 0.6570
0.3274 57.0 17784 0.3413 0.6679
0.3265 58.0 18096 0.3442 0.6390
0.3265 59.0 18408 0.3417 0.6534
0.326 60.0 18720 0.3407 0.6570

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
4

Dataset used to train dkqjrm/20230822124929