Edit model card

20230831201806

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6291
  • Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 340 0.6240 0.5
0.6425 2.0 680 0.6234 0.5
0.6397 3.0 1020 0.6203 0.5
0.6397 4.0 1360 0.6364 0.5
0.6363 5.0 1700 0.7003 0.5
0.6373 6.0 2040 0.6233 0.5
0.6373 7.0 2380 0.6233 0.5
0.637 8.0 2720 0.6515 0.5
0.631 9.0 3060 0.6234 0.5
0.631 10.0 3400 0.6299 0.5
0.633 11.0 3740 0.6315 0.5
0.6325 12.0 4080 0.6281 0.5
0.6325 13.0 4420 0.6434 0.5
0.6267 14.0 4760 0.6233 0.5
0.6323 15.0 5100 0.6253 0.5
0.6323 16.0 5440 0.6233 0.5
0.6325 17.0 5780 0.6314 0.5
0.6274 18.0 6120 0.6265 0.5
0.6274 19.0 6460 0.6298 0.5
0.6301 20.0 6800 0.6363 0.5
0.6268 21.0 7140 0.6296 0.5
0.6268 22.0 7480 0.6402 0.5
0.6316 23.0 7820 0.6282 0.5
0.6272 24.0 8160 0.6233 0.5
0.6314 25.0 8500 0.6245 0.5
0.6314 26.0 8840 0.6702 0.5
0.6298 27.0 9180 0.6484 0.5
0.6282 28.0 9520 0.6235 0.5
0.6282 29.0 9860 0.6524 0.5
0.6259 30.0 10200 0.6245 0.5
0.6271 31.0 10540 0.6233 0.5
0.6271 32.0 10880 0.6320 0.5
0.6264 33.0 11220 0.6240 0.5
0.6265 34.0 11560 0.6325 0.5
0.6265 35.0 11900 0.6329 0.5
0.6268 36.0 12240 0.6377 0.5
0.6261 37.0 12580 0.6234 0.5
0.6261 38.0 12920 0.6323 0.5
0.626 39.0 13260 0.6402 0.5
0.6245 40.0 13600 0.6264 0.5
0.6245 41.0 13940 0.6245 0.5
0.6253 42.0 14280 0.6278 0.5
0.6223 43.0 14620 0.6260 0.5
0.6223 44.0 14960 0.6236 0.5
0.6266 45.0 15300 0.6378 0.5
0.6219 46.0 15640 0.6349 0.5
0.6219 47.0 15980 0.6393 0.5
0.6256 48.0 16320 0.6266 0.5
0.6241 49.0 16660 0.6338 0.5
0.624 50.0 17000 0.6237 0.5
0.624 51.0 17340 0.6265 0.5
0.6214 52.0 17680 0.6259 0.5
0.627 53.0 18020 0.6324 0.5
0.627 54.0 18360 0.6257 0.5
0.6218 55.0 18700 0.6246 0.5
0.621 56.0 19040 0.6242 0.5
0.621 57.0 19380 0.6336 0.5
0.6212 58.0 19720 0.6236 0.5
0.6239 59.0 20060 0.6489 0.5
0.6239 60.0 20400 0.6256 0.5
0.6218 61.0 20740 0.6251 0.5
0.6216 62.0 21080 0.6279 0.5
0.6216 63.0 21420 0.6305 0.5
0.6196 64.0 21760 0.6326 0.5
0.6251 65.0 22100 0.6288 0.5
0.6251 66.0 22440 0.6412 0.5
0.6162 67.0 22780 0.6270 0.5
0.6231 68.0 23120 0.6261 0.5
0.6231 69.0 23460 0.6254 0.5
0.6215 70.0 23800 0.6237 0.5
0.6202 71.0 24140 0.6265 0.5
0.6202 72.0 24480 0.6329 0.5
0.6184 73.0 24820 0.6292 0.5
0.6207 74.0 25160 0.6304 0.5
0.6193 75.0 25500 0.6271 0.5
0.6193 76.0 25840 0.6301 0.5
0.6202 77.0 26180 0.6261 0.5
0.6188 78.0 26520 0.6289 0.5
0.6188 79.0 26860 0.6293 0.5
0.6197 80.0 27200 0.6291 0.5

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
9

Dataset used to train dkqjrm/20230831201806