20230822185237

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3335
  • Accuracy: 0.6498

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.002
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 60.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 312 0.3589 0.5415
0.4381 2.0 624 0.3585 0.5560
0.4381 3.0 936 0.4824 0.4729
0.4251 4.0 1248 0.3497 0.5740
0.4013 5.0 1560 0.5515 0.5307
0.4013 6.0 1872 0.5300 0.5343
0.4064 7.0 2184 0.3515 0.4982
0.4064 8.0 2496 0.3456 0.5704
0.4121 9.0 2808 0.3522 0.5632
0.4048 10.0 3120 0.3437 0.5632
0.4048 11.0 3432 0.3483 0.5668
0.4035 12.0 3744 0.3952 0.4657
0.3797 13.0 4056 0.3535 0.4801
0.3797 14.0 4368 0.3443 0.5993
0.3657 15.0 4680 0.3431 0.5379
0.3657 16.0 4992 0.3478 0.5993
0.3615 17.0 5304 0.3475 0.6173
0.3573 18.0 5616 0.3539 0.6101
0.3573 19.0 5928 0.3384 0.6101
0.3552 20.0 6240 0.3483 0.6245
0.3545 21.0 6552 0.3359 0.6173
0.3545 22.0 6864 0.3844 0.5740
0.349 23.0 7176 0.3436 0.6498
0.349 24.0 7488 0.3422 0.6209
0.351 25.0 7800 0.3495 0.6318
0.3471 26.0 8112 0.3498 0.6101
0.3471 27.0 8424 0.3316 0.6462
0.3468 28.0 8736 0.3322 0.6751
0.3459 29.0 9048 0.3354 0.6390
0.3459 30.0 9360 0.3353 0.6390
0.344 31.0 9672 0.3383 0.6354
0.344 32.0 9984 0.3329 0.6245
0.3435 33.0 10296 0.3411 0.6390
0.3408 34.0 10608 0.3414 0.6354
0.3408 35.0 10920 0.3319 0.6534
0.3401 36.0 11232 0.3347 0.6282
0.3406 37.0 11544 0.3382 0.6137
0.3406 38.0 11856 0.3355 0.6245
0.3378 39.0 12168 0.3416 0.6245
0.3378 40.0 12480 0.3422 0.6209
0.3386 41.0 12792 0.3388 0.6390
0.3362 42.0 13104 0.3330 0.6390
0.3362 43.0 13416 0.3393 0.6282
0.3373 44.0 13728 0.3340 0.6282
0.3337 45.0 14040 0.3318 0.6390
0.3337 46.0 14352 0.3323 0.6354
0.3332 47.0 14664 0.3301 0.6643
0.3332 48.0 14976 0.3422 0.6282
0.3315 49.0 15288 0.3348 0.6570
0.33 50.0 15600 0.3366 0.6462
0.33 51.0 15912 0.3308 0.6570
0.331 52.0 16224 0.3298 0.6606
0.3295 53.0 16536 0.3377 0.6498
0.3295 54.0 16848 0.3439 0.6462
0.3282 55.0 17160 0.3326 0.6570
0.3282 56.0 17472 0.3356 0.6498
0.3291 57.0 17784 0.3309 0.6570
0.3278 58.0 18096 0.3333 0.6498
0.3278 59.0 18408 0.3324 0.6498
0.3292 60.0 18720 0.3335 0.6498

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
12
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train dkqjrm/20230822185237