Edit model card

20230831142618

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6294
  • Accuracy: 0.5016

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0007
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 340 0.6452 0.5
0.6361 2.0 680 0.6325 0.5
0.6378 3.0 1020 0.6175 0.5
0.6378 4.0 1360 0.6705 0.5
0.6367 5.0 1700 0.6476 0.5
0.6284 6.0 2040 0.6180 0.5
0.6284 7.0 2380 0.6174 0.5
0.6308 8.0 2720 0.6441 0.5
0.622 9.0 3060 0.6199 0.5
0.622 10.0 3400 0.6222 0.5
0.6264 11.0 3740 0.6177 0.5
0.6249 12.0 4080 0.6172 0.5
0.6249 13.0 4420 0.6872 0.5
0.6205 14.0 4760 0.6174 0.5
0.6347 15.0 5100 0.6236 0.5
0.6347 16.0 5440 0.6170 0.5
0.6369 17.0 5780 0.6180 0.5
0.623 18.0 6120 0.6256 0.5
0.623 19.0 6460 0.6349 0.5
0.6278 20.0 6800 0.6554 0.5
0.6255 21.0 7140 0.6173 0.5
0.6255 22.0 7480 0.6215 0.5
0.6286 23.0 7820 0.6201 0.5
0.6235 24.0 8160 0.6176 0.5
0.6289 25.0 8500 0.6216 0.5
0.6289 26.0 8840 0.6522 0.5
0.6236 27.0 9180 0.6193 0.5
0.6227 28.0 9520 0.6175 0.5
0.6227 29.0 9860 0.6504 0.5
0.6211 30.0 10200 0.6442 0.5
0.623 31.0 10540 0.6181 0.5
0.623 32.0 10880 0.6220 0.5
0.6206 33.0 11220 0.6185 0.5
0.621 34.0 11560 0.6238 0.5
0.621 35.0 11900 0.6277 0.5
0.6216 36.0 12240 0.6352 0.5
0.6211 37.0 12580 0.6170 0.5
0.6211 38.0 12920 0.6169 0.5
0.6203 39.0 13260 0.6410 0.5
0.619 40.0 13600 0.6190 0.5
0.619 41.0 13940 0.6228 0.5
0.6214 42.0 14280 0.6214 0.5
0.617 43.0 14620 0.6212 0.5
0.617 44.0 14960 0.6172 0.5
0.6211 45.0 15300 0.6309 0.5
0.6168 46.0 15640 0.6250 0.5
0.6168 47.0 15980 0.6371 0.5
0.621 48.0 16320 0.6187 0.5
0.6179 49.0 16660 0.6272 0.5
0.6185 50.0 17000 0.6184 0.5
0.6185 51.0 17340 0.6207 0.5
0.6154 52.0 17680 0.6187 0.5
0.6204 53.0 18020 0.6225 0.5
0.6204 54.0 18360 0.6177 0.5
0.6161 55.0 18700 0.6319 0.5
0.6231 56.0 19040 0.6109 0.5
0.6231 57.0 19380 0.6058 0.5
0.6051 58.0 19720 0.6064 0.5
0.5939 59.0 20060 0.6035 0.5
0.5939 60.0 20400 0.6428 0.5125
0.5818 61.0 20740 0.5962 0.5
0.5724 62.0 21080 0.5954 0.5
0.5724 63.0 21420 0.5971 0.5
0.565 64.0 21760 0.6361 0.5047
0.563 65.0 22100 0.6182 0.5016
0.563 66.0 22440 0.6006 0.5
0.5456 67.0 22780 0.6329 0.5016
0.5507 68.0 23120 0.6332 0.5031
0.5507 69.0 23460 0.6358 0.5
0.5446 70.0 23800 0.6326 0.5031
0.5364 71.0 24140 0.6283 0.5016
0.5364 72.0 24480 0.6214 0.5016
0.5335 73.0 24820 0.6173 0.5
0.532 74.0 25160 0.6214 0.5016
0.5274 75.0 25500 0.6298 0.5016
0.5274 76.0 25840 0.6313 0.5016
0.5265 77.0 26180 0.6241 0.5
0.5233 78.0 26520 0.6215 0.5
0.5233 79.0 26860 0.6280 0.5016
0.5235 80.0 27200 0.6294 0.5016

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
14

Dataset used to train dkqjrm/20230831142618