Edit model card

20230831190406

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6234
  • Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0007
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 340 0.6536 0.5
0.6466 2.0 680 0.6207 0.5
0.6506 3.0 1020 0.6654 0.5
0.6506 4.0 1360 0.6698 0.5
0.6458 5.0 1700 0.6234 0.5
0.6363 6.0 2040 0.6246 0.5
0.6363 7.0 2380 0.6367 0.5
0.6401 8.0 2720 0.6582 0.5
0.6347 9.0 3060 0.6257 0.5
0.6347 10.0 3400 0.6281 0.5
0.6378 11.0 3740 0.6234 0.5
0.637 12.0 4080 0.6274 0.5
0.637 13.0 4420 0.6362 0.5
0.6313 14.0 4760 0.6290 0.5
0.6359 15.0 5100 0.6302 0.5
0.6359 16.0 5440 0.6246 0.5
0.639 17.0 5780 0.6319 0.5
0.6302 18.0 6120 0.6255 0.5
0.6302 19.0 6460 0.6325 0.5
0.6329 20.0 6800 0.6434 0.5
0.6309 21.0 7140 0.6238 0.5
0.6309 22.0 7480 0.6237 0.5
0.6325 23.0 7820 0.6296 0.5
0.6303 24.0 8160 0.6249 0.5
0.6357 25.0 8500 0.6235 0.5
0.6357 26.0 8840 0.6258 0.5
0.6327 27.0 9180 0.6442 0.5
0.6309 28.0 9520 0.6329 0.5
0.6309 29.0 9860 0.6374 0.5
0.6304 30.0 10200 0.6243 0.5
0.6311 31.0 10540 0.6302 0.5
0.6311 32.0 10880 0.6247 0.5
0.6294 33.0 11220 0.6233 0.5
0.6303 34.0 11560 0.6252 0.5
0.6303 35.0 11900 0.6365 0.5
0.63 36.0 12240 0.6300 0.5
0.6304 37.0 12580 0.6290 0.5
0.6304 38.0 12920 0.6243 0.5
0.6288 39.0 13260 0.6440 0.5
0.6298 40.0 13600 0.6260 0.5
0.6298 41.0 13940 0.6296 0.5
0.6292 42.0 14280 0.6245 0.5
0.6255 43.0 14620 0.6253 0.5
0.6255 44.0 14960 0.6459 0.5
0.631 45.0 15300 0.6321 0.5
0.6248 46.0 15640 0.6314 0.5
0.6248 47.0 15980 0.6335 0.5
0.6293 48.0 16320 0.6240 0.5
0.6285 49.0 16660 0.6238 0.5
0.6277 50.0 17000 0.6247 0.5
0.6277 51.0 17340 0.6378 0.5
0.625 52.0 17680 0.6237 0.5
0.6301 53.0 18020 0.6246 0.5
0.6301 54.0 18360 0.6236 0.5
0.6247 55.0 18700 0.6237 0.5
0.6253 56.0 19040 0.6252 0.5
0.6253 57.0 19380 0.6261 0.5
0.6243 58.0 19720 0.6250 0.5
0.6268 59.0 20060 0.6387 0.5
0.6268 60.0 20400 0.6233 0.5
0.625 61.0 20740 0.6239 0.5
0.6245 62.0 21080 0.6233 0.5
0.6245 63.0 21420 0.6256 0.5
0.6232 64.0 21760 0.6263 0.5
0.6279 65.0 22100 0.6233 0.5
0.6279 66.0 22440 0.6339 0.5
0.6185 67.0 22780 0.6237 0.5
0.627 68.0 23120 0.6246 0.5
0.627 69.0 23460 0.6241 0.5
0.6242 70.0 23800 0.6254 0.5
0.6229 71.0 24140 0.6236 0.5
0.6229 72.0 24480 0.6242 0.5
0.621 73.0 24820 0.6238 0.5
0.6226 74.0 25160 0.6237 0.5
0.6222 75.0 25500 0.6233 0.5
0.6222 76.0 25840 0.6244 0.5
0.6224 77.0 26180 0.6234 0.5
0.6212 78.0 26520 0.6239 0.5
0.6212 79.0 26860 0.6238 0.5
0.6222 80.0 27200 0.6234 0.5

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
9

Dataset used to train dkqjrm/20230831190406