Edit model card

20230826073557

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4014
  • Accuracy: 0.72

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.02
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 25 0.4958 0.46
No log 2.0 50 0.5956 0.54
No log 3.0 75 0.5377 0.45
No log 4.0 100 0.4202 0.61
No log 5.0 125 0.4367 0.44
No log 6.0 150 0.4370 0.51
No log 7.0 175 0.4207 0.66
No log 8.0 200 0.4423 0.58
No log 9.0 225 0.4107 0.61
No log 10.0 250 0.4332 0.64
No log 11.0 275 0.4055 0.6
No log 12.0 300 0.4376 0.63
No log 13.0 325 0.4062 0.57
No log 14.0 350 0.4000 0.61
No log 15.0 375 0.4052 0.63
No log 16.0 400 0.3961 0.68
No log 17.0 425 0.3976 0.67
No log 18.0 450 0.4186 0.65
No log 19.0 475 0.4304 0.63
0.731 20.0 500 0.4358 0.69
0.731 21.0 525 0.4135 0.68
0.731 22.0 550 0.4180 0.68
0.731 23.0 575 0.4627 0.66
0.731 24.0 600 0.4150 0.65
0.731 25.0 625 0.4005 0.67
0.731 26.0 650 0.4123 0.7
0.731 27.0 675 0.4342 0.69
0.731 28.0 700 0.4551 0.67
0.731 29.0 725 0.4222 0.69
0.731 30.0 750 0.4226 0.71
0.731 31.0 775 0.4702 0.69
0.731 32.0 800 0.4100 0.7
0.731 33.0 825 0.4318 0.69
0.731 34.0 850 0.4447 0.71
0.731 35.0 875 0.3881 0.72
0.731 36.0 900 0.4234 0.69
0.731 37.0 925 0.4869 0.69
0.731 38.0 950 0.4352 0.71
0.731 39.0 975 0.4465 0.71
0.5086 40.0 1000 0.4135 0.7
0.5086 41.0 1025 0.4061 0.7
0.5086 42.0 1050 0.4437 0.72
0.5086 43.0 1075 0.4461 0.72
0.5086 44.0 1100 0.4144 0.69
0.5086 45.0 1125 0.3973 0.71
0.5086 46.0 1150 0.4511 0.73
0.5086 47.0 1175 0.4273 0.71
0.5086 48.0 1200 0.4100 0.71
0.5086 49.0 1225 0.4209 0.72
0.5086 50.0 1250 0.4191 0.74
0.5086 51.0 1275 0.4023 0.74
0.5086 52.0 1300 0.4038 0.72
0.5086 53.0 1325 0.4148 0.73
0.5086 54.0 1350 0.4263 0.72
0.5086 55.0 1375 0.4331 0.73
0.5086 56.0 1400 0.4373 0.71
0.5086 57.0 1425 0.4081 0.72
0.5086 58.0 1450 0.4078 0.71
0.5086 59.0 1475 0.4250 0.72
0.4268 60.0 1500 0.4224 0.7
0.4268 61.0 1525 0.4255 0.7
0.4268 62.0 1550 0.4114 0.72
0.4268 63.0 1575 0.4266 0.72
0.4268 64.0 1600 0.4097 0.72
0.4268 65.0 1625 0.4053 0.72
0.4268 66.0 1650 0.4051 0.71
0.4268 67.0 1675 0.4135 0.73
0.4268 68.0 1700 0.3959 0.74
0.4268 69.0 1725 0.4162 0.72
0.4268 70.0 1750 0.4061 0.73
0.4268 71.0 1775 0.4016 0.71
0.4268 72.0 1800 0.4194 0.71
0.4268 73.0 1825 0.4098 0.72
0.4268 74.0 1850 0.4179 0.71
0.4268 75.0 1875 0.4105 0.71
0.4268 76.0 1900 0.4140 0.72
0.4268 77.0 1925 0.4081 0.73
0.4268 78.0 1950 0.4044 0.73
0.4268 79.0 1975 0.3996 0.72
0.3915 80.0 2000 0.4014 0.72

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
8
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train dkqjrm/20230826073557