Edit model card

20230826083404

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5588
  • Accuracy: 0.56

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 25 0.6769 0.61
No log 2.0 50 0.5349 0.59
No log 3.0 75 0.6615 0.58
No log 4.0 100 0.6596 0.64
No log 5.0 125 0.5523 0.71
No log 6.0 150 0.8447 0.67
No log 7.0 175 0.7506 0.66
No log 8.0 200 0.8463 0.68
No log 9.0 225 0.9064 0.56
No log 10.0 250 0.5533 0.58
No log 11.0 275 0.5701 0.41
No log 12.0 300 0.5593 0.51
No log 13.0 325 0.5599 0.52
No log 14.0 350 0.5619 0.37
No log 15.0 375 0.5591 0.56
No log 16.0 400 0.5569 0.55
No log 17.0 425 0.5511 0.56
No log 18.0 450 0.5599 0.52
No log 19.0 475 0.5561 0.59
1.4827 20.0 500 0.5577 0.57
1.4827 21.0 525 0.5537 0.58
1.4827 22.0 550 0.5616 0.43
1.4827 23.0 575 0.5607 0.34
1.4827 24.0 600 0.5616 0.39
1.4827 25.0 625 0.5597 0.56
1.4827 26.0 650 0.5623 0.41
1.4827 27.0 675 0.5612 0.43
1.4827 28.0 700 0.5573 0.57
1.4827 29.0 725 0.5631 0.42
1.4827 30.0 750 0.5594 0.51
1.4827 31.0 775 0.5593 0.56
1.4827 32.0 800 0.5646 0.43
1.4827 33.0 825 0.5664 0.44
1.4827 34.0 850 0.5597 0.56
1.4827 35.0 875 0.5629 0.41
1.4827 36.0 900 0.5610 0.43
1.4827 37.0 925 0.5572 0.58
1.4827 38.0 950 0.5592 0.6
1.4827 39.0 975 0.5553 0.59
1.1505 40.0 1000 0.5597 0.58
1.1505 41.0 1025 0.5570 0.62
1.1505 42.0 1050 0.5582 0.6
1.1505 43.0 1075 0.5601 0.46
1.1505 44.0 1100 0.5598 0.55
1.1505 45.0 1125 0.5574 0.59
1.1505 46.0 1150 0.5591 0.52
1.1505 47.0 1175 0.5601 0.5
1.1505 48.0 1200 0.5593 0.56
1.1505 49.0 1225 0.5600 0.48
1.1505 50.0 1250 0.5620 0.39
1.1505 51.0 1275 0.5598 0.51
1.1505 52.0 1300 0.5616 0.39
1.1505 53.0 1325 0.5601 0.43
1.1505 54.0 1350 0.5617 0.4
1.1505 55.0 1375 0.5619 0.41
1.1505 56.0 1400 0.5625 0.39
1.1505 57.0 1425 0.5591 0.56
1.1505 58.0 1450 0.5588 0.59
1.1505 59.0 1475 0.5580 0.59
0.9071 60.0 1500 0.5584 0.62
0.9071 61.0 1525 0.5590 0.58
0.9071 62.0 1550 0.5585 0.57
0.9071 63.0 1575 0.5586 0.59
0.9071 64.0 1600 0.5589 0.57
0.9071 65.0 1625 0.5587 0.59
0.9071 66.0 1650 0.5588 0.61
0.9071 67.0 1675 0.5592 0.57
0.9071 68.0 1700 0.5579 0.58
0.9071 69.0 1725 0.5586 0.56
0.9071 70.0 1750 0.5590 0.57
0.9071 71.0 1775 0.5590 0.57
0.9071 72.0 1800 0.5590 0.59
0.9071 73.0 1825 0.5591 0.56
0.9071 74.0 1850 0.5586 0.56
0.9071 75.0 1875 0.5590 0.56
0.9071 76.0 1900 0.5592 0.57
0.9071 77.0 1925 0.5587 0.53
0.9071 78.0 1950 0.5588 0.56
0.9071 79.0 1975 0.5589 0.58
0.7248 80.0 2000 0.5588 0.56

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
8
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train dkqjrm/20230826083404