Edit model card

20230826052713

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4913
  • Accuracy: 0.72

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.02
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 25 0.6276 0.57
No log 2.0 50 0.6136 0.63
No log 3.0 75 0.6774 0.66
No log 4.0 100 0.5964 0.64
No log 5.0 125 0.5316 0.62
No log 6.0 150 0.5231 0.62
No log 7.0 175 0.5156 0.63
No log 8.0 200 0.6216 0.64
No log 9.0 225 0.5013 0.71
No log 10.0 250 0.5734 0.7
No log 11.0 275 0.4683 0.66
No log 12.0 300 0.5333 0.73
No log 13.0 325 0.6740 0.69
No log 14.0 350 0.5185 0.71
No log 15.0 375 0.5031 0.71
No log 16.0 400 0.5398 0.71
No log 17.0 425 0.5246 0.73
No log 18.0 450 0.7414 0.69
No log 19.0 475 0.6817 0.72
0.7352 20.0 500 0.6656 0.71
0.7352 21.0 525 0.5839 0.76
0.7352 22.0 550 0.6626 0.76
0.7352 23.0 575 0.5017 0.75
0.7352 24.0 600 0.5168 0.74
0.7352 25.0 625 0.5912 0.78
0.7352 26.0 650 0.5596 0.77
0.7352 27.0 675 0.4884 0.77
0.7352 28.0 700 0.4738 0.73
0.7352 29.0 725 0.5052 0.76
0.7352 30.0 750 0.6163 0.74
0.7352 31.0 775 0.5824 0.74
0.7352 32.0 800 0.4995 0.72
0.7352 33.0 825 0.4936 0.71
0.7352 34.0 850 0.5464 0.72
0.7352 35.0 875 0.5164 0.74
0.7352 36.0 900 0.5088 0.75
0.7352 37.0 925 0.5991 0.75
0.7352 38.0 950 0.4963 0.73
0.7352 39.0 975 0.5086 0.72
0.411 40.0 1000 0.5203 0.73
0.411 41.0 1025 0.5844 0.74
0.411 42.0 1050 0.5285 0.74
0.411 43.0 1075 0.5553 0.74
0.411 44.0 1100 0.5588 0.71
0.411 45.0 1125 0.5392 0.72
0.411 46.0 1150 0.5494 0.72
0.411 47.0 1175 0.4982 0.76
0.411 48.0 1200 0.5374 0.72
0.411 49.0 1225 0.5730 0.73
0.411 50.0 1250 0.5149 0.72
0.411 51.0 1275 0.4949 0.72
0.411 52.0 1300 0.5295 0.73
0.411 53.0 1325 0.5223 0.72
0.411 54.0 1350 0.5617 0.71
0.411 55.0 1375 0.5373 0.72
0.411 56.0 1400 0.4857 0.73
0.411 57.0 1425 0.4954 0.72
0.411 58.0 1450 0.5024 0.72
0.411 59.0 1475 0.4971 0.74
0.318 60.0 1500 0.5265 0.73
0.318 61.0 1525 0.4967 0.71
0.318 62.0 1550 0.4972 0.73
0.318 63.0 1575 0.4908 0.72
0.318 64.0 1600 0.5056 0.74
0.318 65.0 1625 0.5231 0.74
0.318 66.0 1650 0.4737 0.75
0.318 67.0 1675 0.5016 0.72
0.318 68.0 1700 0.4988 0.73
0.318 69.0 1725 0.5276 0.74
0.318 70.0 1750 0.4912 0.73
0.318 71.0 1775 0.4865 0.72
0.318 72.0 1800 0.4754 0.73
0.318 73.0 1825 0.4922 0.73
0.318 74.0 1850 0.4884 0.74
0.318 75.0 1875 0.4868 0.73
0.318 76.0 1900 0.4872 0.73
0.318 77.0 1925 0.4848 0.72
0.318 78.0 1950 0.4923 0.72
0.318 79.0 1975 0.4888 0.73
0.287 80.0 2000 0.4913 0.72

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
8
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train dkqjrm/20230826052713