Edit model card

20230823034647

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0699
  • Accuracy: 0.5162

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 60.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 312 0.0734 0.4729
0.1062 2.0 624 0.0755 0.5235
0.1062 3.0 936 0.0930 0.4729
0.0892 4.0 1248 0.0712 0.5307
0.0836 5.0 1560 0.0838 0.5307
0.0836 6.0 1872 0.0706 0.4801
0.0839 7.0 2184 0.0738 0.4729
0.0839 8.0 2496 0.0972 0.5271
0.0838 9.0 2808 0.0804 0.5415
0.0842 10.0 3120 0.0705 0.5199
0.0842 11.0 3432 0.0706 0.5848
0.0832 12.0 3744 0.0776 0.4729
0.0838 13.0 4056 0.0972 0.4729
0.0838 14.0 4368 0.0705 0.4838
0.0824 15.0 4680 0.0725 0.4693
0.0824 16.0 4992 0.0711 0.5884
0.0815 17.0 5304 0.0702 0.4729
0.0827 18.0 5616 0.0707 0.5921
0.0827 19.0 5928 0.0865 0.5307
0.0821 20.0 6240 0.0702 0.5235
0.0817 21.0 6552 0.0822 0.4729
0.0817 22.0 6864 0.0753 0.5632
0.0822 23.0 7176 0.0717 0.4729
0.0822 24.0 7488 0.0702 0.4765
0.0812 25.0 7800 0.0769 0.5307
0.0795 26.0 8112 0.0768 0.4729
0.0795 27.0 8424 0.0852 0.4729
0.0802 28.0 8736 0.0718 0.4729
0.0792 29.0 9048 0.0725 0.4729
0.0792 30.0 9360 0.0706 0.5668
0.0794 31.0 9672 0.0720 0.5812
0.0794 32.0 9984 0.0712 0.4801
0.0791 33.0 10296 0.0711 0.4801
0.0782 34.0 10608 0.0703 0.5054
0.0782 35.0 10920 0.0708 0.4838
0.0778 36.0 11232 0.0716 0.4729
0.0777 37.0 11544 0.0711 0.6570
0.0777 38.0 11856 0.0731 0.4729
0.0775 39.0 12168 0.0714 0.4729
0.0775 40.0 12480 0.0710 0.6282
0.0772 41.0 12792 0.0701 0.4765
0.0773 42.0 13104 0.0701 0.5307
0.0773 43.0 13416 0.0707 0.5668
0.0772 44.0 13728 0.0705 0.5848
0.0773 45.0 14040 0.0701 0.5235
0.0773 46.0 14352 0.0699 0.5090
0.0769 47.0 14664 0.0705 0.4765
0.0769 48.0 14976 0.0699 0.5451
0.0768 49.0 15288 0.0701 0.5668
0.0769 50.0 15600 0.0701 0.4765
0.0769 51.0 15912 0.0699 0.5271
0.0774 52.0 16224 0.0700 0.4729
0.0768 53.0 16536 0.0700 0.5126
0.0768 54.0 16848 0.0702 0.5957
0.0765 55.0 17160 0.0706 0.4729
0.0765 56.0 17472 0.0700 0.5379
0.0766 57.0 17784 0.0700 0.5343
0.0767 58.0 18096 0.0701 0.4838
0.0767 59.0 18408 0.0699 0.5054
0.0766 60.0 18720 0.0699 0.5162

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
4

Dataset used to train dkqjrm/20230823034647