Edit model card

20230824064444

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0709
  • Accuracy: 0.7329

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.003
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 60.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 312 0.4733 0.5307
0.3538 2.0 624 0.1917 0.5126
0.3538 3.0 936 0.1696 0.5560
0.2775 4.0 1248 0.1700 0.5271
0.2538 5.0 1560 0.3497 0.5343
0.2538 6.0 1872 0.2183 0.5632
0.259 7.0 2184 0.1783 0.5018
0.259 8.0 2496 0.2321 0.5848
0.2587 9.0 2808 0.2081 0.6101
0.2211 10.0 3120 0.1194 0.6715
0.2211 11.0 3432 0.1505 0.6390
0.198 12.0 3744 0.1130 0.7004
0.1939 13.0 4056 0.1187 0.6679
0.1939 14.0 4368 0.1175 0.6787
0.1687 15.0 4680 0.1092 0.7040
0.1687 16.0 4992 0.0984 0.7076
0.1511 17.0 5304 0.1032 0.7076
0.1448 18.0 5616 0.1024 0.7401
0.1448 19.0 5928 0.0902 0.7112
0.1392 20.0 6240 0.0972 0.7112
0.1283 21.0 6552 0.0880 0.7184
0.1283 22.0 6864 0.0892 0.7329
0.1257 23.0 7176 0.1156 0.7401
0.1257 24.0 7488 0.0940 0.7329
0.1215 25.0 7800 0.0876 0.7401
0.1184 26.0 8112 0.1289 0.7437
0.1184 27.0 8424 0.0808 0.7256
0.1112 28.0 8736 0.0823 0.7401
0.1139 29.0 9048 0.0838 0.7256
0.1139 30.0 9360 0.0855 0.7220
0.1095 31.0 9672 0.0813 0.7256
0.1095 32.0 9984 0.0765 0.7256
0.106 33.0 10296 0.0847 0.7365
0.1034 34.0 10608 0.0844 0.7509
0.1034 35.0 10920 0.0811 0.7184
0.0991 36.0 11232 0.0811 0.7292
0.0938 37.0 11544 0.0847 0.7365
0.0938 38.0 11856 0.0824 0.7256
0.0973 39.0 12168 0.0760 0.7292
0.0973 40.0 12480 0.0786 0.7220
0.0908 41.0 12792 0.0732 0.7473
0.0894 42.0 13104 0.0763 0.7401
0.0894 43.0 13416 0.0811 0.7365
0.0896 44.0 13728 0.0734 0.7473
0.0882 45.0 14040 0.0747 0.7329
0.0882 46.0 14352 0.0729 0.7401
0.0847 47.0 14664 0.0723 0.7329
0.0847 48.0 14976 0.0748 0.7401
0.0854 49.0 15288 0.0755 0.7292
0.0813 50.0 15600 0.0715 0.7329
0.0813 51.0 15912 0.0719 0.7292
0.0845 52.0 16224 0.0721 0.7401
0.0821 53.0 16536 0.0711 0.7292
0.0821 54.0 16848 0.0714 0.7437
0.0802 55.0 17160 0.0711 0.7401
0.0802 56.0 17472 0.0718 0.7329
0.0798 57.0 17784 0.0708 0.7220
0.0796 58.0 18096 0.0715 0.7365
0.0796 59.0 18408 0.0712 0.7329
0.0806 60.0 18720 0.0709 0.7329

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
1

Dataset used to train dkqjrm/20230824064444