Edit model card

20230824083855

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0821
  • Accuracy: 0.7473

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.003
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 60.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.5366 1.0 623 0.8415 0.4729
0.3757 2.0 1246 0.3098 0.4693
0.3001 3.0 1869 0.5999 0.4729
0.3227 4.0 2492 0.2808 0.4729
0.3109 5.0 3115 0.2772 0.5487
0.3034 6.0 3738 0.1529 0.6029
0.2648 7.0 4361 0.1565 0.6029
0.2104 8.0 4984 0.1394 0.6245
0.1926 9.0 5607 0.1404 0.6390
0.175 10.0 6230 0.1292 0.6859
0.1634 11.0 6853 0.1174 0.7004
0.1618 12.0 7476 0.1228 0.6787
0.1555 13.0 8099 0.1287 0.6534
0.1534 14.0 8722 0.1461 0.6570
0.1523 15.0 9345 0.1356 0.6426
0.1448 16.0 9968 0.1065 0.6968
0.1402 17.0 10591 0.1011 0.7292
0.1342 18.0 11214 0.1112 0.6643
0.1388 19.0 11837 0.1255 0.6823
0.1281 20.0 12460 0.0965 0.7220
0.128 21.0 13083 0.0985 0.7040
0.1236 22.0 13706 0.1339 0.7040
0.1267 23.0 14329 0.1238 0.7365
0.1186 24.0 14952 0.0942 0.7292
0.1101 25.0 15575 0.0923 0.7220
0.1122 26.0 16198 0.0919 0.7401
0.1088 27.0 16821 0.0893 0.7292
0.1059 28.0 17444 0.0897 0.7401
0.106 29.0 18067 0.0878 0.7509
0.1019 30.0 18690 0.0945 0.7365
0.1047 31.0 19313 0.0900 0.7256
0.1011 32.0 19936 0.0884 0.7437
0.0962 33.0 20559 0.0874 0.7329
0.0971 34.0 21182 0.0933 0.7329
0.0914 35.0 21805 0.0845 0.7473
0.0965 36.0 22428 0.0914 0.7365
0.0914 37.0 23051 0.0855 0.7292
0.0894 38.0 23674 0.0867 0.7256
0.087 39.0 24297 0.0861 0.7329
0.0865 40.0 24920 0.0830 0.7329
0.0851 41.0 25543 0.0827 0.7473
0.0837 42.0 26166 0.0818 0.7365
0.0865 43.0 26789 0.0840 0.7401
0.0807 44.0 27412 0.0815 0.7292
0.0829 45.0 28035 0.0840 0.7365
0.0814 46.0 28658 0.0851 0.7401
0.0798 47.0 29281 0.0841 0.7401
0.0806 48.0 29904 0.0838 0.7473
0.0773 49.0 30527 0.0823 0.7401
0.0769 50.0 31150 0.0813 0.7329
0.0763 51.0 31773 0.0822 0.7509
0.0792 52.0 32396 0.0833 0.7365
0.0772 53.0 33019 0.0819 0.7365
0.0732 54.0 33642 0.0810 0.7365
0.0708 55.0 34265 0.0808 0.7365
0.0741 56.0 34888 0.0824 0.7509
0.0725 57.0 35511 0.0816 0.7437
0.072 58.0 36134 0.0812 0.7437
0.0712 59.0 36757 0.0827 0.7401
0.0707 60.0 37380 0.0821 0.7473

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
10

Dataset used to train dkqjrm/20230824083855